Arnold Gesell’s Developmental Assessment Revalidation Substantiates Child-Oriented Curriculum

Educators and parents are increasingly concerned about effects of high-stakes testing on children who may not be developmentally ready to perform tasks according to rigorous standards of today’s kindergartens. In response to this issue, and to provide new psychometric data for the Gesell Developmental Observation (GDO), Gesell Institute conducted a nationwide study with nearly 1,300 children aged 3 to 6 years. Results confirm that children are able to perform developmental items according to a sequential trajectory of increasing difficulty, relative to their chronological age in 6-month intervals, and that performance mastery on items does not occur at the same time for all children of the same age. Results support the continued use of the revised GDO, now named the Gesell Developmental Observation–Revised (GDO-R), as an instrument to determine a child’s developmental level along a continuous path of growth and learning. Also discussed is the importance of establishing effective and appropriate academic goals based on a child’s developmental assessment results.


Introduction
Promoting academic expectations that are inappropriately aligned with a child's development creates unrealistic levels of achievement for young children. As the problematic No Child Left Behind legislation saw many children indeed left "behind," American educators and policy makers advocated for more consistent and rigorous standardized learning benchmarks for younger children, resulting in the Common Core State Standards for Kindergarten to Twelfth Grade in 2010. While these standards represent a widespread initiative to try to close the achievement gap and provide accountability through test scores, many early childhood advocates question their appropriateness and whether or not they are achievable.
Many policy and decision makers appear to be obsessed with the educational idea that sooner is better when it comes to learning core knowledge. Jean Piaget (1896Piaget ( -1980, an influential Swiss child psychologist, referred to the American fixation that children should learn "sooner and faster" as "the American disease" (Guddemi & Zigler, 2011). Teaching academic tasks to children at earlier ages will not result in greater learning for the vast majority of children, due to the developmental trajectory of child development and individual differences among children. Furthermore, increased testing will not help or hasten the process. In fact, testing is very unreliable with young children under age eight. Experts have determined there is a 40% correlation between intelligence tests administered prior to kindergarten and results on achievement tests in third grade (Kim & Suen, 2003). However, informed parents, educators, and advocates of young children seek effective ways to establish academic goals for young children which correspond appropriately to developmental level, including social, emotional, and adaptive capacities.
A tool that addresses this need is the Gesell Developmental Observation-Revised (GDO-R) which is an instrument that evaluates a child's performance on a series of developmental and academic tasks in relation to the sequential ages and stages of typical child development in the cognitive, motor, language, and social/emotional/adaptive domains. The results of the GDO-R reveal a child's overall Developmental Age and Performance Level Ratings in each of the four domains at a point in time. Developmental age is an age in years and half years which best describes a child's collective behavior and performances on a developmental scale. A Developmental Age may differ from the child's chronological age, being lower, higher, or the same. Knowing each child's Developmental Age enables educators to customize developmentally appropriate academic experiences and expectations to best meet the learning needs of every child. Arnold Gesell, PhD and MD, developed an assessment of human development, identifying the ages and stages of child development based on his maturationist theory (Gesell, 1925). He published the original Gesell assessment, known today as the GDO-R. It was updated in 1940and 1965. In 1979 Ilg published Gesell Institute's The Child From One to Six: Evaluating the Behavior of the Preschool Child with updated technical data for the GDO. In 2011, after a nationwide study of 3-to 6-year-old children, the newly revalidated and revised GDO-R was published. This article presents the psychometric results for each of the tasks on the GDO-R, indicating the typical ages at which specific developmental and academic tasks, needed for realistic, developmentally appropriate success in kindergarten, are mastered. Reliability and validity evidence are also reported to support the continued use of the GDO-R as a developmental assessment for children aged 3 to 6 years. Three tasks are discussed to illustrate the need to establish effective and appropriate academic goals based on a child's developmental assessment results.

Literature Review
Experts have stressed the significance of the kindergarten year as it relates to the child's development and the child's ability to succeed within the school environment. Kindergarten sets the tone for learning and future school success (Black, 2008;Guddemi & Zigler, 2011). Embarking on new learning creates numerous opportunities for the development of the child not only in areas of cognitive, social, emotional, and physical growth, but also as an individual within a community. Life-long, vital skills are acquired through the learning opportunities presented within the kindergarten environment.
With an increased emphasis placed on rigorous new standards and accountability, educators and parents are faced with new challenges relating to school readiness and the kindergarten curriculum. Schools play an important role in readiness; however, various schools have different expectations regarding readiness. A child may be considered prepared for one school environment and not prepared for another based on that particular school's expectations for readiness (Maxwell & Clifford, 2004). Parents and educators are concerned due to the increase of pressures and demands within early learning environments. The academic expectations of today's kindergarten are set similar to the achievement levels of first grades 20 years ago (Almon & Miller, 2011;Miller & Almon, 2009).
Although there is much discussion related to the readiness of incoming kindergarten children, it is a school's responsibility to educate children who are legally of age to attend school. Most states require children to attend school by a certain age regardless of their readiness or skill level. On the flip-side, there is also a need for schools to be ready for the child. It is widely accepted that school readiness is multidimensional (Ewing Marion Kauffman Foundation, 2002;National Education Goals Panel, 1997) and encompasses the following areas: physical well-being and motor development, social and emotional development, language development, approaches to learning, and cognition and general knowledge (National Education Goals Panel, 1997). Therefore, to be ready for the child, all of these areas must be addressed by the school. Furthermore, it is essential that schools, communities, and families acknowledge gaps in each child's educational abilities that can occur based not only on individual differences in normal development but also on such factors as birth weight, nutrition, television viewing, parent-child ratio, children's exposure to language and literacy, and parental involvement and participation in the child's well-being.
Despite a national focus on early childhood education, current research suggests that educational gaps continue to exist and that achievement gaps occur prior to the beginning of elementary school (Langham, 2009). It has been suggested that high-quality early education in combination with high-quality kindergarten through third-grade programs plays a critical role in attempting to close educational gaps and potentially contributes to enhancing the child's development, school readiness, and future school success. Longterm effects of a quality pre-kindergarten experience can affect grade retention, placement, special education, and school dropout rates (Barnett, 1993;Campbell, Ramey, Pungello, Sparling, & Miller-Johnson, 2002;Mead, 2008). Unfortunately, not all children have the opportunity to participate in a high-quality pre-kindergarten program.
A consistent characteristic of high-quality pre-kindergarten and kindergarten programs is the reliance on developmentally appropriate practices (DAPs) for each child based on each child's needs. DAP is defined by the National Association for the Education of Young Children (NAEYC) as knowing where a child is developmentally, providing unique experiences based on his or her stage of development that are both challenging and achievable, and possessing knowledge about how young children learn. The organization supports educators in "promoting young children's optimal learning and development" (Copple & Bredekamp, 2009, p. 16). NAEYC's DAP encourages educators to provide learning opportunities that will enhance all areas of a child's development and to understand that a child's development follows a well-documented, sequential order. DAP is based on being aware that each child develops at his or her own unique rate and that learning opportunities need to be challenging, but within the child's ability. Understanding a child's development is key to setting expectations that are appropriate and to planning curriculum that meets the child's needs and abilities. Early educators need to adhere to methods and practices of teaching that foster a child's development with learning being concentrated in all areas of development-cognitive, social, emotional, language, and physical (Kagan & Reid, 2009).
To plan DAP, assessing where a child is on the path of development is essential in determining what experiences a child is ready for. This use of readiness assessments should never exclude children from learning opportunities, but rather help determine how and what educational and learning experiences should be developed and/or modified to meet the child's developmental level (Gullo, 2005). Through documentation and assessment, an educator is better able to understand the child. These instruments offer insight into a child's development and his ability to learn, making learning visible to the educator (Seitz, 2008). As learning is multidimensional, it is important for an assessment to be used as a tool to help educators better understand children, their development, and how they learn (Tomlinson, 2008).

About the GDO-R
The GDO-R is a standardized, performance-based, criterionreferenced developmental assessment tool. It is designed for children from 2½ to 9 years of age and is used to inform educators and parents about a child's progress on developmental continuums. This information helps to set appropriate expectations for performance as well as instruction for children based on their developmental stage or level. When combined with the Parent/Guardian Questionnaire (PQ) and the Teacher Questionnaire (TQ), the GDO-R functions as a comprehensive assessment system. The GDO-R can also help determine whether or not a child may need further diagnostic evaluations to suggest appropriate planning or remediation in specific areas of development.
The purpose of the GDO Study was to provide updated technical data and reliability evidence for 17 of the 19 original tasks on the ©2007 GDO (see Table 1); two tasks that were intended for assessing older children, Right and Left and Visual III, were omitted. Another purpose of the GDO Study was to define Overt Behavior (Task 20) and to strengthen the social/emotional/adaptive domain (Task 21). The criterion that shaped this study, and subsequent outcomes, was based on three sources of information:

Method
The GDO study consisted of several sub-studies designed to collect both quantitative and qualitative data. Quantitative data were collected on children in seven age bands (spanning ages 3-6 years) and is reported here. Examiners administered a total of 167 items in one-on-one sessions with children. Data were also collected for each child from the teacher's observation of the child in the classroom (45 items) and from the parent's observation of the child at home (78 items). Qualitative data were collected on two developmental tasks on the GDO, the Copy Forms and Incomplete Man, as part of the Gesell Institute National Lecture Staff (NLS) Review Study. The purpose of the NLS Review Study was threefold: to collect data on the qualitative features of each developmental stage for Copy Forms and Incomplete Man, to establish inter-rater reliability for each Copy Form item, and to establish inter-rater reliability for assigning a Developmental Age to Copy Forms and Incomplete Man samples.

GDO Study Timeline
The GDO Study was completed over the course of 3 years following American Educational Research Association (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1999) standards. A timeline is presented in Figure 1 to provide context and sequence for the data activities and analyses of the GDO Study. The steps in the timeline are as follows.

Review of Child Development Literature
A comprehensive review of child development literature on observation, assessment methodology, and assessment instruments for children ages 2½ to 9 was undertaken. The GDO instrument was evaluated as a whole based on appropriateness of developmental tasks to early educational goals for young children, culturally sensitive measurement approaches, and consistency with developmental capabilities of children between the ages of 2½ to 9.

Content Validity
After a careful review of the literature and with the results of a Bias Review, the research team felt the items on the GDO Steps with 10 cubes. The ability to reproduce the structures successfully and the approach to the item used by the child provides information about horizontal and visual perception, fine motor coordination, attention span, spatial judgment, and short-term memory.

2 Interview
A child's responses to the series of questions related to home life (such as his or her favorite story or TV program) revealed expressive and receptive language skills, as well as the ability to recall everyday experiences. Responses provide a glimpse of the child's cognitive organizational skills, ability to stay on task, and ability to follow directions. While these are important GDO-R tasks, study data for Interview and Interests (Task 10) were not analyzed in aggregate since the examiner evaluated individual language samples across the entire assessment session, and scored the domain using a qualitative rubric.

Name and Numbers
This set of items requires the child to first write his or her name and as many numerals (up to 20) as he or she can. Both this task and Task 4 (Copy Forms) evaluate a child's competence in integrating visual information with motor abilities, visual tracking skills, and discrimination abilities. The size, shape, and organization of the products drawn indicate maturity in fine motor ability, organizational skills, awareness of detail, visual perceptions, ability to execute angles, and overall eye-hand coordination.

9, 10 4 Copy Forms
The child is asked to copy a Circle, Cross, Square, Triangle, Divided Rectangle, Vertical and Horizontal Diamonds, and ultimately 3-dimensional shapes (cube and cylinder) according to his or her age and demonstrated ability. Some age 6 children were not administered the first three items (Scribble, Horizontal, and Vertical Stroke) when the examiner believed the items were too easy for these children. In this case, the children received a missing score that was treated as incorrect for the analyses, leading to the lower p values for these items for age 6 children.

5
Incomplete Man This task requires a child to add missing symmetrical body parts to a given drawing. It measures fine motor skill, perceptual awareness, balance, symmetry, and spontaneous task completion.

6
Right and Left Not included in the GDO study. This task is intended for children aged 6 and older. 7 Visual I This visual discrimination task requires a child to match symbols presented one at a time on a card to the corresponding symbol on a worksheet. This task measures a child's competence in left-to-right directionality, visual discrimination, ability to sustain attention, to find one's place repeatedly, and to carry out directions.
13 8 Visual III Not included in the GDO study. This task is intended for children aged 6 and older. 9 Naming Animals The child is encouraged to name all the animals he can think of. Responses provide information about a child's level of expressive and receptive language, retrieval skills, and cognitive organization processes. Recall, ability to conceptualize, attention to task, and classification skills are also observed in this 60 second timed task.
14 10 Interests See description in Task 2: Interview  11 Prepositions This item assesses the child's understanding of specific prepositional phrases and his or her ability to apply them to a corresponding action (placing place a cube on, under, in back of, in front of, and beside a chair).

12
Digit Repetition This task requires the child to repeat a series of digits with increasing length. It measures auditory and short-term memory, as well as listening ability. As age increases, children's ability to repeat increasingly longer digit sets increase as well.

13 Comprehension Questions
During the Comprehension Question task, the child is asked "What must you do when you are hungry, sleepy, cold, have lost something, or cross the street?" A child's performance in this area measures cognitive processes related to problem solving, personal experience, and knowledge and understanding of specific words and phrases.

Task No. Task name
Refer to Table(s)   14 Color Forms This task measures visual discrimination by asking a child to place cut-out shapes on a corresponding board. This task is designed to better differentiate performance at younger ages.

Three-Hole Form Board
This item uses puzzle-like materials to measure a child's visual discrimination, depth perception, and spatial perceptual accuracy in a variety of orientations. This task is expected to better differentiate performance at younger ages.

16
Action Agents This item requires the child to generate a word (noun) that could produce the action suggested; e.g., "what cries or what runs?" Language comprehension skills are measured on this item. It requires a relatively long period of sustained attention to the task. 20 17 Identifying Letters and Numbers (a) Identifying Letters: This task requires a child to identify random capital letters. This task is dependent on prior exposure and knowledge of the alphabet. (b) Identifying Numbers: This task requires a child to identify random numerals 1 to 12 by name. This task is dependent on exposure to and knowledge of numerals.

21, 22
18 Numeracy (Counting, One-to-One Correspondence, Conservation, and Calculations) (a) Counting: Counting reveals the child's experience with and ability to remember numbers in a sequence. Children were allowed to count up to 40. (b) and (c) One-to-One Correspondence and Conservation: One-toone correspondence evaluates the child's understanding that each item is represented only once by a number name. Children learn to count with one-to-one correspondence before they learn to conserve the same number of items; e.g., the child when asked, "how many altogether?" must know that the number of pennies that he or she just counted is still the same number. Social, Emotional, and Adaptive The GDO-R utilizes three subscales of the Teacher and Parent/Guardian Questionnaires to measure social interactions with adults and peers, ability to self regulate and cope with transitions, and self-help skills in daily life.

39
Note. GDO-R = Gesell Developmental Observation-Revised. instrument remained relevant with the exception of two items. The GDO Study was designed to collect updated technical and baseline data using the ©2007 GDO as the basis of the investigation to renew the reliability and validity evidence for the instrument. In addition, the PQ and the TQ were developed to address the social, emotional, and adaptive development of the child.

Procedures for Reducing Bias
The five experts who reviewed the GDO for bias also evaluated content and age appropriateness for each task. The team was selected from the fields of early childhood education, special education, physical movement, and test development. Each reviewer was asked to respond to a set of specific questions, to identify any biases inherent in the content or methodology of the GDO. The bias review questions can be found in the full Technical Report (Gesell Institute of Child Development, 2012) at www.gesellinstitute.org. Several GDO tasks raised issues due to a current lack of cultural relevancy. One item within the Interview task asked the child about his or her most recent birthday celebration, and ability to recall presents that he or she received. This question was omitted from the Naming Animals because it failed to allow for the breadth and complexity of birthday celebrations in different cultures and by families with lower socioeconomic circumstances. Another item in the Interview task prompts the child to name animals, presuming all children have had the experience of visiting a farm or zoo. This question was revised to "Have you ever been to or read a book about the zoo or a farm?" Also, a new question about watching television was added to the Interview to elicit more exchange with the child for a language evaluation.

Online User Survey
Prior to the commencement of the study, an online survey of GDO users was conducted to collect information on how customers use the GDO. A sample of N = 153 respondents provided feedback. A summary of the questions and quantitative results can be found in the full Technical Report (2012) Appendix A.

Focus Group
A focus group held at a Massachusetts preschool was moderated by a member of the Mid-Continent Research for Education and Learning (McREL) research team. The purpose of the focus group was to gather information on qualitative improvements to the GDO that would be most meaningful to examiners, in addition to the updated technical data, which were collected to renew the validity of the instrument.

TQ and PQ
It was recommended by all reviewers that the GDO-R include a measure of emotional regulation and social behavior by surveying the child's teacher and parent or guardian. After reviewing the literature on emotional, social, and adaptive behaviors and examining multiple existing parent and teacher questionnaires, the TQ and PQ were developed for the GDO Study.

Procedures for Recruiting Examiners and Distributing Materials
Each site that enrolled in the study signed an agreement and secured parental consent forms for each participating child (Gesell Institute of Child Development, 2012).
Trained examiners administered all designated items on the GDO to children within the study age band (see Table 2) in individual assessment sessions. A standardized script guided the examiner in the administration of each task. Examiners in the study did not score or determine a Developmental Age. Data were collected from three sources at each school: the child (GDO tasks-GDO), the parent or guardian (PQ), and the child's teacher (TQ). All GDO assessment forms and study materials were provided to each school free of charge, including return shipping and handling. A Training DVD was given to each site to standardize the training for all examiners in the study. A conference call was held between each school's examining team and the Gesell research team to review study protocol, answer questions, and offer support. Assistance was also supplied by phone, e-mail, and fax throughout the entire course of the schools' participation in the study. Each site returned hardcopies of all data forms to Gesell Institute for review, validation, cleaning, and data entry.

Research Sample
The research sample was primarily a sample of convenience drawn from a national population of typically developing children attending schools that administered the GDO. A subset of schools from New Haven, CT, also participated in the study. Examiners for these schools were trained at Gesell Institute, as the schools did not currently utilize the GDO but wanted to participate in the study.
The final sample for analysis for the GDO Study included 1,287 children from 53 geographically diverse sites in 23 states. While the assessment is designed for ages 2½ to 9 years, the researchers chose to limit hands-on data collection to the group of children that comprised the largest number of users of the GDO. The chronological age threshold for each age band is described in Table 2. The age bands targeted for data collection were 3, 3½, 4, 4½, 5, 5½, and 6.

Site Sample
The sample included a diverse group of sites in terms of the type of school, region of the country, size, and population served (i.e., ethnicity and percentage eligible for free and reduced lunch). Refer to Figure 2 for sample distribution by state and Tables 3 to 5 for demographics of participating sites.

Examiner Sample
One hundred and one trained GDO examiners, with a mean of 12 years teaching experience, collected GDO data from children at preschool and elementary schools across the nation. The group of examiners, with a mean of 7 years GDO experience, received GDO training in one of two ways: (1) in the last 5 years through a 3-day workshop or (2) a 1-day training session at the Institute. Of the examiner sample, 75% were currently teaching and 25% reported that they were retired, no longer teaching in a classroom, or were volunteer examiners. Approximately 88% of the GDO study examiners have a bachelor's, master's, or doctoral degree. Refer to Table 6 for examiner's level of education.

Data Validation and Entry Procedures
Data were systematically reviewed for completion, accuracy, and any possible serious administration errors prior to being entered into an electronic survey gizmo file. Unusable data were put aside. Ten percent of the data entered in each of the data sets (GDO, TQ, and PQ) was checked by a team of interns and any keystroke or scoring errors were corrected. A subsample of data collected in New Haven, CT, by a team of subcontractors was also checked for accuracy and reliability by members of the research team.
A total of 1,363 GDO assessments were submitted to Gesell Institute for evaluation. After careful review for accuracy, proper administration, and age requirements, a final sample of 1,287 GDO assessments were used in the final analysis. Thus, 5% of the sample could not be used and was deleted from the original sample. Over the course of 2 years, communication with new sites about data collection/submission procedures improved significantly, and thus further reduced the number of invalid assessments (i.e., child was too young or old for the study, etc.) that were submitted by each site.

Sample Descriptive Statistics
The final sample used for analysis was a proportional mix of boys and girls in each age band (3.0-6.0 years) from ethnically diverse backgrounds. The number of children in each age band varied (see Table 7).

Task Descriptive Statistics by Age Band
The p value represents the proportion of children who provided the correct responses to the item (scored with 0, 1). For Chronological ages of children included in each band 3 2 years 9 months and 0 days to 3 years 2 months and 29 days 3 6 3 years 3 months and 0 days to 3 years 8 months and 29 days 4 3 years 9 months and 0 days to 4 years 2 months and 29 days 4 6 4 years 3 months and 0 days to 4 years 8 months and 29 days 5 4 years 9 months and 0 days to 5 years 2 months and 29 days 5 6 5 years 3 months and 0 days to 5 years 8 months and 29 days 6 5 years 9 months and 0 days to 6 years 2 months and 29 days polytomous items (scored with 3 or more score points; for example, Incomplete Man scored according to level of cues, or Cubes scored according to performance with or without a demonstration), the p value represents the average proportion of the maximum possible score children received. With few exceptions, Tables 8 through 29 show growth in performance on the GDO tasks across age bands. Some tasks/items have been organized into separate tables because they require different statistical operations to best demonstrate the findings. These are as follows: 1. Tasks that contain continuous items that are scored from 0 to a maximum number: the number of numerals written, and the number of those numerals that were in sequential order (Table 10); the number of animals named (Table 14); how high the child counted (Table  23); and the number of seconds for dominant hand pellets (Table 26). Also included are Social/Emotional/ Adaptive Items (Table 30). Because the items are scored in terms of a maximum number, means and standard deviations provide better evidence than p values regarding children's performance.    2. Tasks that contain categories: items from Distinguishing Features and Overt Behavior. These items were analyzed using a frequency distribution, because means and standard deviations were not appropriate. These are not reported here.

Item p Values by Task and Age Band
Tables 8 through 29 provide results for the GDO-R items by task and age band. The tasks are described in order of test administration. Some item responses were missing when children did not provide an answer to an item or were not administered an item because they gave incorrect responses to the number of previous items that met the stop rule.
Missing item responses were treated as incorrect for these analyses.
The p values for items within a task tend to reflect the fact that items gradually increase in difficulty. For this reason, p values tend to be somewhat higher for earlier items and lower for later items. p values also reflect children's increased competency with age; p values are lower for the younger children and higher for the older children. Dr. Gesell utilized two levels for identifying developmental competence, successful or not successful, on items and tasks at a 51% benchmark. The results presented here utilize three Performance Level Expectations for each GDO-R task, rather than two. The criteria set for each expectation were established using a developmental framework of growth and learning in which a child acquires the capacity to succeed at more difficult items after having mastered the less complex items which precede it. They are shaded accordingly: • • Solid Expectation (SE)-over 70% of the children could complete the task (dark gray shading) • • Qualified Expectation (QE)-50% to 69.9% could complete the task (light gray shading) • • Not Yet Expected-under 50% of the children could complete the task (no shading) The Performance Level Expectations for each task reflect responses of a large group of children of the same age in the sample that were able to complete the task independently, without demonstrations or cues from the examiner. Typical GDO-R administration allows for demonstrations and cues for some items because it allows the examiner to differentiate between developmental levels and also reveals the child's approach to the task, especially when they are challenged by an item or task. In this article, we report on Solid Expectation, over 70% of the children could complete the task.

Content-Related Validity
Content-related validity is evidenced by uniformity between task content and the developmental milestones widely accepted to precede instructional content in each area. To ensure such correspondence for the GDO-R, Gesell Institute conducted a comprehensive review of current child development theory and met with education experts to determine common educational goals and the knowledge and skills emphasized in today's early childhood curricula. The graphic design of the assessment and its manipulative materials reflect the types of activities found in early childhood classrooms and in children's everyday lives. An online user survey provided additional information regarding overall assessment effectiveness (addressing such topics as the appropriateness of the criteria for developmental age, ease of administration, and appropriateness for each age). These validation efforts resulted in an assessment that reflects the needs of classroom teachers, children, and parents. Note. Missing data for the child's native language occurred when Parent/Guardian Questionnaires (PQs) were not returned. In such cases, a shortened form of the PQ (Child Demographic Form) was modified so that basic demographic information could still be collected from school records for participating families (sex, ethnicity, DOB, Native language). Missing data for ethnicity are within the acceptable limit. Missing data for native language do not affect findings because all children were administered the assessment in English and had to be deemed fluent by the teachers who administered the assessment.

Inter-Rater Reliability
Four NLS members participated in the Qualitative Review Study. Three hold a master's degree in Early Childhood and/ or Child Development, and one a bachelor's in Child Development. Collectively, the Qualitative Review Study team held over one hundred years of experience administering the GDO and conducting Gesell workshops on topics such as school readiness, parent involvement, and child development.
Inter-rater reliability of the GDO-Revised provides evidence regarding the degree to which Developmental Age can be reliably assigned. The inter-rater reliability study included a subsample of children's performance on the Incomplete Man and Copy Forms tasks. Table 31 describes the sample used in the inter-rater reliability study. The sample for Incomplete Man was smaller than the sample for Copy Forms, because some children were rated as unable to score by one or both raters.
Inter-rater reliability was calculated by comparing the agreement between the developmental ages assigned by Rater A and Rater B of Team 1 for each task. During Phase 1, Rater A and Rater B of Team 1 rated Copy Forms, while Rater A and Rater B of Team 2 rated Incomplete Man. Interrater agreement for assigning overall Developmental Age was calculated for Copy Forms and Incomplete Man samples. Inter-rater agreement was also calculated for each individual Copy Form item in Phase 2.
During Phase 1, for both Incomplete Man and Copy Forms, neither team had access to the child's chronological age; the raters used only the actual work samples and process sheets of the children in the sample. Inter-rater agreement on developmental age, as measured by the Pearson product moment correlation, was high for both Incomplete Man and Copy Forms (see Table 32). These high correlations provide evidence that developmental age can be reliably assigned by trained raters using the GDO-R.
In addition, each rater was asked to rank order all the children in the sample by developmental age. The rank order correlation (Spearman rho) presented in Table 32 provides further evidence of the reliability of developmental ages as assigned by trained GDO-R administrators. The agreement between the two raters' overall developmental age rankings was high for both Copy Forms (.91) and Incomplete Man (.92), showing that raters ranked the children by developmental age very similarly.
Finally, to examine the degree to which the Developmental Age assigned by raters corresponded to the children's actual age (i.e., chronological age), the Pearson product moment correlations between Developmental Age and chronological age were calculated. Correlations were calculated separately Note. In Cubes, examiners were instructed to administer each item, and provide a DEMO if required. Each item was scored using three categories: successfully completed without DEMO (2), successfully completed with DEMO (1), unsuccessful (0). for Rater A and Rater B. These correlations were high (range = .78-.82), and in the expected range, providing evidence that the assigned Developmental Ages corresponded closely, but not exactly, to children's chronological age (see Table  32). Perfect correlations are not expected because of the variation in development between children. In Phase 2, Table 33 presents results of inter-rater reliability for Copy Forms items. Raters used the same children's work samples as were used in the examination of the reliability of Developmental Age (Phase 1). However, each team of raters that conducted the inter-rater reliability for Incomplete Man during Phase 1 subsequently conducted the inter-rater reliability for Copy Form items for Phase 2, and vice versa. For item inter-rater reliability of individual Copy Forms items, raters also had access to children's chronological age, because this is the standard scoring practice. Sample sizes varied by item, because some children were rated as unable to score by one or both raters.  Note. The number of body parts included for Incomplete Man task in the study was 13. These include Knot, Other-1, and Other-2 that are not customarily part of the standard Gesell Developmental Observation-Revised (GDO-R) administration. Thus, the mean number of body parts is relative to a denominator of 13, rather than 10. During data collection, examiners were instructed to use appropriate cueing if required, and to score each body part added into four categories: Body part added Spontaneously (3), Body part added following a General Cue (2), Body part added following a Specific Cue (1), or Body part not added at all (0). Table 33 indicate strong correlation between raters for each Copy Form item. In addition, the means and standard deviations for Rater 1 and Rater 2's scores are very similar. It is important to note that for Cube Face-on and Cube Point-on items, a proportionally large number of children were rated as unable to score by both raters. All children who could be scored received a score of 0, resulting in an inter-rater reliability of 1.00. Cube Face-on and Cube Point-on are some of the most difficult items in the entire GDO-Revised. Very few 6-year-old children in the entire study sample received a correct score on these items. 3.0 (n = 47) 3.5 (n = 123) 4.0 (n = 170) 4.5 (n = 253) 5.0 (n = 248) 5.5 (n = 190) 6.0 (n = 142) Note. In the Visual I task, the first item was a teaching item (triangle), and "recognizing the skip" was an item scored as part of the total (12). Standard deviations are indicated in parentheses. Note. In the Naming Animals task, the total number of items in the task was predetermined to be 60 as a baseline for all age groups.      Note. Administration was terminated when child unsuccessfully repeated two out of three digit sets in the row.

Limitations
The GDO Study contributes a comprehensive sample of child development data to the educational field at large. As with any study of its size and scope, it has limitations. The distribution of child ethnicity across the total sample more closely resembled the U.S. Census than did the distribution of child ethnicity in each age band. Thus, interpretation of change across age bands could possibly be attributable to a sample shift as opposed to a definitive age shift. In the case of the PQ, the percentage of missing data for child's ethnicity was strongly mitigated by efforts on the part of the school and research team to gather this information from other school records (as reported by parents). However, since the native language of the child was also derived from the PQ, efforts to collect accurate information on a child's native language from the school were less fruitful, and resulted in higher percentages of missing data across age bands. In some sites, it was not possible to administer the PQ due to the nature of testing at the site (i.e., GDO-R tests were part of admission protocol in private schools. These schools did not administer the GDO Study PQ because it contained questions that could be perceived to affect a child's eligibility for school acceptance [special evaluations, services, level of education of parent]). In other sites, bussing of children in urban communities meant that parents did not physically come to the school to return a PQ or may have been reluctant to share such information with the school administration. Note. In the Three-Hole Form Board task, children were given four presentations of the board, each rotated 180 degrees while keeping the board parallel to the table (i.e., board was not flipped over). Each presentation was scored using three categories: successfully completed (2), successfully completed with Trial and Error (1), or Unsuccessful (0). 3.0 (n = 51) 3.5 (n = 129) 4.0 (n = 176) 4.5(n = 258) 5.0 (n = 254) 5.5 (n = 189) 6.0 (n = 140) Note. The number of Action Agents includes both Mews and Meows as both were tested in the study protocol. Thus, the mean number of Action Agents named correctly is relative to a denominator of 21, rather than 20.
While the examiners were trained carefully on the GDO-R task administration, they did not receive recording and coding rubrics to score the following observations of the child during all tasks in the assessment: Paper Position, Head Shift, Body Posture, Non-Dominant Hand Posture, and Eye Movement. This may explain why observational data on these items contain missing cases (examiners did not complete the section of the form). Thus, a shortcoming of this study is that these items cannot clearly be interpreted. However, the Qualitative Review Study and Inter-rater 3.0 (n = 50) 3.5 (n = 125) 4.0 (n = 172) 4.5 (n = 229) 5.0 (n = 232) 5.5 (n = 184) 6.0 (n = 139) Reliability Study strongly confirm the developmental characteristics of each age band as related to the Copy Forms and Incomplete Man tasks. This is very important because it provides recent validity evidence for these specific developmental tasks and allows for continual improvements to the training of examiners.

Implications
The most valuable implication of this research is that the GDO-R has renewed reliability and validity evidence to support its continued use as a developmental instrument to evaluate growth and development of children aged 3 to 6 years and to inform instruction for developmentally appropriate activities. The results from this study also support the original findings for developmental tasks as originally published by Arnold Gesell (Gesell, 1925). Children are developing and reaching the major developmental milestones at about the same time as they did when Dr. Gesell first started collecting date over a century ago. A few of the important implications of the research for educators nationwide include the following: 1. Perceiving oblique lines is a prerequisite to letter formation and writing-two essential expectations in the kindergarten curriculum of today. Building the Gate (Task 1: Cubes) and copying the Triangle (Task 4: Copy Forms) require that the child not only perceive the oblique angle of the cube or the form, but is able to reproduce the structure in 3-D or on paper.
The GDO study documents that this developmental capacity is solid only by age 5 (Task 1: Cubes-Gate) and 5.5 (Task 4: Copy Forms-Triangle). Educators must be alert to both variations in chronological age and developmental level to properly balance the pace and sequence of daily learning activities for each child. 2. Children correctly identify letters in the alphabet in a graduated process that is affected by age, experience, Note. Children were scored using three categories: successfully completed without pennies (2), successfully completed with pennies (1), or unsuccessful (0).  and exposure to the printed word. As such, the average 4.5 year old can successfully identify approximately 12 letters of the alphabet while a year later, at 5.5, they can identify 21 to 22 letters. Educators who attempt to teach writing letters before the age of 5.5 (when most children can perceive and execute the oblique lines of letters) are doing their young students a disservice, which may result in a child internalizing failed attempts at writing before his or her developmental capacity for the task exists. Taking the time to understand how developmental level can be leveraged for teaching will benefit both children and teachers.
3. Educators who are able to recognize when a child is beginning to conserve 10 or more items will likely find that the child can also begin to succeed at simple calculations which have final answers less than 5 (beginning around 5.5 years and solid expectation by 6). Until a child can conserve item sets of 13 to 20, his or her success at calculations will likely remain the product of memorization or chance, as opposed to concepts of true numeracy. 3.0 (n = 51) 3.5 (n = 129) 4.0 (n = 172) 4.5 (n = 233) 5.0 (n = 232) 5.5 (n = 183) 6.0 (n = 136)

Conclusion
The results of this study, based on a culturally and socioeconomically diverse sample of children 3 to 6 years of age in seven age bands, provide evidence that children's performance on developmental and academic tasks, as measured by the GDO-Revised, occurs in a sequential progression of mastery which increases with age. In addition, the results provide evidence that not all children of the same chronological age arrive at each developmental level for the same tasks at the same time. Thus, there exists variation in performance on developmental and academic tasks between children of the same age. Future research should include a more intensive analysis of the data by weighing variables such as child ethnicity, geography, and socioeconomic level to pursue stability in the findings. It is essential that educators, policy makers, and parents understand the significance of developmental level when setting standards for all children. Because children in kindergarten are at various chronological ages and develop at varying rates, having the same set of standards and expectations for all children at a given time is both inappropriate and potentially harmful for children.
Utilizing standardized, performance-based instruments to understand a child's developmental level, cultural and social influences, and individual interests allows for appropriate expectations, relevant goals for learning, and proper accountability in the educational system. Educators can utilize each child's unique developmental profile to plan curriculum that respects the developmental level and potential of the child by using robust observational methods coupled with comprehensive developmental assessment tools.
The results of the GDO Study presented here strongly support the GDO-R as a reliable and valid developmental measurement tool, confirm the essential role that a child's developmental level plays in his or her success for learning today, and suggest that having the same expectation for all children at the same time is inappropriate if not impossible.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research and/or authorship of this article. Note. n = sample size; M = mean; SD = standard deviation; r = Pearson product moment correlation coefficient.