Readability, content, and mechanical feature analysis of selected commercial science textbooks intended for third grade filipino learners

Textbooks remain as major learning aids in all levels of education worldwide. Thus, textbooks must constantly be subjected to critical analysis in order to establish their curricular usefulness. This paper analyzed four commercial science textbooks designed for third grade Filipino learners. The four textbooks were analyzed according to readability and comprehensibility, content, and mechanical features. Readability was evaluated using popular readability formulas, while comprehensibility was determined using the Sonmez’s formula and the cloze test methods. The content features were the textbooks’ alignment with national science standards, conceptual errors, and level of gender bias. Finally, the mechanical features were focused on printing and lay-out, and handiness of the textbooks. Readability analysis revealed that the textbooks were written three to four grade levels higher, and two to 3 years older than their intended users. The dominant reading ease of the four textbooks ranged from fairly easy to easy, which means that the texts are suitable to sixth and seventh graders. The texts were highly to perfectly aligned with the country’s national science standards. Three textbooks are generally gender fair while one has a low-level male bias. The average error/conceptual problem density is at one error in every six to eight pages. Misidentifications are the most common conceptual problems in the textbooks. Finally, the textbooks are noted very good in printing, layout, paper quality, binding, and handiness. Subjects: Primary/Elementary Education; Childhood; Classroom Practice Apler J. Bansiong ABOUT THE AUTHOR Apler J. Bansiong is an associate professor in science education in a state-owned university in a northern Philippine province. He is currently pursuing his PhD in Education major in Biology at the University of the Philippines Open University. His research interest includes ethnobotany, curriculum materials development and evaluation, and inquiry-based learning. PUBLIC INTEREST STATEMENT This paper evaluated four commercial science textbooks intended for non-native English speakers who are taking science for the first time as a formal course. It added relevant information to the limited research on the quality of textbooks, especially those textbooks published in the Philippines, and used by Filipino learners. While there are problems as the readability and comprehensibility features in the four textbooks, they are never-the-less satisfactory in terms of content and mechanical features. These positive results are indications of the book publishers and writers giving premium on content alignment with curriculum standards, accuracy of Bansiong, Cogent Education (2019), 6: 1706395 https://doi.org/10.1080/2331186X.2019.1706395 © 2019 The Author(s). This open access article is distributed under a Creative Commons Attribution (CC-BY) 4.0 license. Received: 18 August 2019 Accepted: 03 December 2019 First Published : 18 December 2019 *Corresponding author: Apler J. Bansiong, College of Teacher Education, Benguet State University, Philippines E-mail: abansiong@yahoo.com Reviewing editor: Peter Wan, Curriculum and Instruction, Education University of HongKong, Hong Kong Additional information is available at the end of the article


PUBLIC INTEREST STATEMENT
This paper evaluated four commercial science textbooks intended for non-native English speakers who are taking science for the first time as a formal course. It added relevant information to the limited research on the quality of textbooks, especially those textbooks published in the Philippines, and used by Filipino learners. While there are problems as the readability and comprehensibility features in the four textbooks, they are never-the-less satisfactory in terms of content and mechanical features. These positive results are indications of the book publishers and writers giving premium on content alignment with curriculum standards, accuracy of

Introduction
Even up to this digital age, textbooks remain the most valuable component of the curriculum in many countries, despite the proliferation of electronic media and other more sophisticated sources of information (Jonane, 2015;Ruddick, 2014;Wu & Liu, 2015). Researchers reveal that students spend 80-95% of classroom time on textbooks (Sadker & Zittleman, 2007), while teachers spend 70-90% classroom time in these learning materials. Most of teachers' instructional decisions are based on textbooks, and students usually absorb all the details in textbooks without doubt (Sadker and Zittleman, 2007).
Textbooks are widely used because they are prepared according to a certain organization that is compliant with current curriculum mandates (Cardak, Dikmenli, & Guven, 2016). This textbook usefulness is particularly true in primary education where classes are taught by generalist teachers who are expected to teach all the courses within a particular year level.
Even with the science education reform efforts to deemphasize the use of textbooks in science teaching in favor of more engaging learning experiences such as inquiry, textbook use in science classrooms remains popular (Izgi & Seker, 2012). The reality of time and resource constraints, and the high proportion of non-specialist science teachers teaching science, has resulted in the overreliance of textbooks (Mcdonald, 2016). Such an observation is not very different from the realities in the Philippines (SEI-DOST, & UP NISMED, 2011).
The indispensability of textbooks, particularly in the foundation years of education, necessitates careful analysis and scrutiny of their content and appropriateness to their intended users. According to Ciftci, Cecen & Melanlioglu (2007) a textbook should be "appropriate to the students' age and level of knowledge", and "prepared in line with the prescribed curricula". Such textbook features were supported by Devetak and Vogrinc (2013), and Izgi and Seker (2012). Trowbridge, Bybee, and Powell (2000) contend that probably the most important consideration when analyzing textbooks for science is their readability and comprehensibility. Indeed, the usefulness of a textbook as a curriculum material can be undermined by whether or not the users can understand its content. Moreover, textbooks with reading levels that are inappropriate to the levels of the prospective users would result to frustration or boredom. These authors suggest that "teachers must select textbooks with reading levels at or slightly below the intended grade or age" (p. 394).
Some authors regard readability and comprehensibility as synonymous (Adelberg & Razek, 1984, cited in Wissing, Blignaut, & Van den Berg, 2016. However, more recent writers contend that the two attributes are closely related but are intrinsically different (Chiang, Englebrecht, Phillips, & Wang, 2008). This latter notion drew support from Smith andTaffler (1992, cited in Wissing et al., 2016). In differentiating the two attributes, Wray and Dahlia (2013) described readability as a characteristic of the text and comprehensibility is an indication of how the readers will make meaning of the text. While a text has to be readable to be understandable, it is comprehensible when it is not syntactically difficult, and is suited to the reader's background, prior knowledge, interest, and general ability (Jones, 1997).
According to Adelberg andRazek (1984, cited in Wissing et al., 2016), a learning material is comprehensible if the readers can understand the content of a material and complete the act of communication initiated by the writers of this material, and when the receiver receives the message as intended by the sender (De Vos & Raepsaet, 2010, cited in Wissing et al., 2016. The comprehensibility of the text can be influenced by its readability. For a text to be understood by a reader, it has to be readable by that reader. A readable text may not necessarily be understandable, even as its understandability can at least be partially predicted by its readability level (Plucinsky, Olsavsky, & Hall, 2009).
In the more practical sense, the readability and comprehensibility of a textbook is dependent on the language spoken by the prospective users. Several studies have shown that non-speakers of the language used in textbooks are confronted with the difficulty of learning the content, as well as the new language (Rollnick, 1999, cited in Yong, 2010. For instance, Lemke (1997, cited in Yong, 2010 reported that English as a second language (ESL) learners have to address two issues in learning science: to learn a new language (i.e., English), and to master the science content. Hence, one of the major problems confronted by ESL learners in learning is the lack of language proficiency (Yong, 2010).
Science as a separate learning area is first taught in the Philippines in the third grade. This learning area is taught in English. This makes the teaching and learning of science in the third grade particularly interesting because at the age of about eight or nine, Filipino learners are in the stage of transition from learning to read, to reading to learn (Sibanda, 2014). It is therefore an added challenge for science teachers to bridge the impending language difficulties and learning gaps.
To an outsider, the use of English as a medium of instruction in science teaching may not be an issue in the Philippines, since English is both an official and a widely spoken language (Smolicz, Nical, & Secombe, 2001). The Philippines presently ranks fourth among the largest Englishspeaking nation, behind the United States, the Great Britain, and India (Floro, 2006). In the last quarter of the 20 th century, English has been used as a medium of instruction, alongside Filipino, which is the national language (Smolicz et al., 2001). However, despite the linguistic advantage, the fact remains that English in the Philippines is a second language, and Filipino learners are ESL learners. The findings in this study may therefore be interpreted in the context of ESL learners, and other countries with similar context may learn from the implications of the findings of this study.
Another important aspect of textbook evaluation is on the critical analysis of their content. Textbook evaluators do content analysis by determining the textbooks' appropriateness to the developmental levels of the prospective users. Most importantly, experts subject textbooks to content analysis to determine alignment with the prescribed curricular standards. Content analyses of textbooks also allow evaluators can detect and correct factual and conceptual errors in the text. Needless to say, error analysis is extremely crucial because the purpose of textbooks as curriculum materials is to provide accurate information to the readers. Finally, textbooks can also be content-analyzed in terms of some emerging content such as learner-centeredness, and gender-sensitivity.

Textbook analysis studies in the Philippines
There seem to be little interest on the analysis of textbooks and other curriculum materials in the Philippines as indicated by scant published literatures that covers this important topic. This is quite surprising as newspaper and mainstream media report on commercially prepared elementary and high school textbooks laden with errors, and or failing the state-prepared textbook evaluation criteria (Araneta, 2005).
The few studies on textbooks intended for Filipino learners tackled issues on gender-sensitivity and sexism. For instance, Manalo (2018) found in her analysis of Junior High School English learners' materials that male characters showed far more visibility than female characters who also rarely came first in gender pairings; male characters were engaged in more active roles than the female characters who, as well, were often linked with occupations that required less leadership and knowledge-based skills and were low-income generating; female characters were often associated with negative traits, while male characters were often attached with positive traits; domestic roles were almost exclusively attached with female characters; and that there were no indication of gender variance or gender nonconformity in all learners' materials that she analyzed.
Thus, Manalo (2018) recommends that K-12 English learners' materials may be revisited and that the selection of the literary pieces may consider the works of female authors who penetrated the literary canon. She further cited that the Department of Education may require all textbooks and learners' materials, writers, editors, and publishing companies that work for DepEd to attend seminars and trainings on gender sensitivity.
Another study explored sexism in Philippine preschool English textbooks (Tarrayo, 2014), focusing on gender visibility (illustrations), "firstness", occupational-role representations, character attributes, and interests and lifestyles. The author found out that the textbooks seem to feature both genders, although the textbooks seem to favor males, thus, appearing to be sexist. In occupational roles, females are far less visible and diverse, and are limited to stereotypical kinds of occupation. Interestingly, the textbooks depict females as being beautiful and passive, and males being aggressive, dominant, and active. In terms of interests and lifestyles, females were more particularly represented in indoor activities, i.e., household chores. The author then recommended the teachers should always be sensitive of sexism and gender stereotypes and biases when selecting textbooks and other curriculum materials. Java and Parcon (2016) analyzed 10 grade-one textbooks used in Philippine public schools for the gender fairness in their illustrations. Using content analysis procedures, the authors observed that female images dominated reproductive functions. The frequent portrayals of traditional roles of both sexes manifest some forms of gender stereotyping. There was unequal representation of males and females, indicating gender bias.
The government-issued textbooks for Filipino kindergarten learners were analyzed by Faustino, Perez-Santos, Fernandez-Distajo, and Ladia (2013). Their analysis centered on the textbooks' content, activities, and skills to be developed in the learners. The contents of the textbooks Were found to be divided into five learning areas: Filipino, English, numeracy, Sensoryperceptual, and socio-emotional development. Furthermore, the contents were arranged from simple to complex. It terms of the activities included in the textbooks, the focus were on identifying, matching, completion, and coloring. Only few activities develop higher-order mental skills such as problem-solving and creative and critical thinking skills. Moreover, the learning activities were found to be redundant and were bereft of interactive activities and exercises that stimulate thinking. The authors then recommended the inclusion of activities that stimulate higher-order thinking and promote problem-solving.
In his search of a paper that evaluates a Philippine textbook in science, this author came across a single work that evaluated an unpublished textbook titled "Concepts of Inorganic and Organic Chemistry". The textbook was authored by the professors in a Philippine University (Sobremisana, Cruz, & Aragon, 2013) and is used exclusively by the students in this university. The said textbook was subjected to user evaluation as to lay-out and design, activities, skills, language type, and subject and content. The authors also performed a quasi-experimental procedure to determine the textbook's effectiveness in enhancing the performance of students in chemistry. Their results showed that the textbook was declared effective in all six characteristics. Also, the quasiexperimental procedure showed that students who used the textbooks performed better in the chemistry achievement test than those who did not use the textbook.
Finally, this author analyzed the instructor-prepared laboratory manuals intended for science education majors in a state-owned university in the Northern Philippines (Bansiong, 2018). The purpose of the study was to rate the level of inquiry in 10 laboratory manuals in three science fields-biology, chemistry, and physics. Results showed that the majority of the exercises were confirmatory in nature. Very few (less than 10%) of the exercises were higher than the guided inquiry level. There were limited activities involving open inquiry and no exercise involved authentic inquiry. Such results indicate that there is a need for professors handling science education majors to focus on less-structured forms of inquiry in their laboratory activities for them to sow the real seeds of inquiry to the future science teachers expected to reform science teaching in the K to 12 levels.
There are not many studies that ventured on evaluation of science textbooks intended for Filipino learners, much less are studies on the grade level appropriateness of these textbooks. Moreover, the author has yet to encounter published papers investigating the content and readability appropriateness of science textbooks written in English, particularly those written for the first formal science course in the elementary level. Also, while there is some noise about government-funded textbooks loaded with errors in the popular and mainstream media as pointed out earlier, there are no empirical evidences to support this claim. This present study could be one of those which will provide insights on this aspect. This paper analyzed the readability level and content of four commercial science textbooks intended for third grade Filipino learners. Specifically, it aimed to: (a) compare the readability features of the textbooks as to age and grade level appropriateness and reading ease; (b) determine the comprehensibility and reading levels of the textbooks; (c) analyze the content features of the selected textbooks as to alignment with national science standards, editing flaws and conceptual problems, and gender sensitivity, (d) determine the teachers' evaluation of the textbooks' lay-out; and (e) report on experts' and parents' evaluation on the durability and handiness in the textbooks.

Sample textbooks
Four commercial science textbooks, which were intended for third grade Filipino learners, were the case textbooks in this study. The evaluated copies were those provided by publishing companies to the schools as evaluation copies. The textbooks were chosen based on the popularity and reputation of the publishers, and upon the recommendation of teachers and local curriculum experts. All the textbooks are latest editions, and are based on the new Kto12 program of the Philippine Department of Education. The book titles and their publishers are shown in Table 1.
Textbooks A and D both consist of 426 pages, while Textbook C and Textbook B have 378 and 296 pages, respectively. As to the number of pages allotted for content, Textbook D and Textbook B have the most pages, with 229 and 213 pages, in this order. Meanwhile, Textbook C has allotted 210 pages for content, while Textbook A has the least content allotment at 177. The other pages were used for activities and exercises, aside from the usual introductory and summary pages.

Textbook analysis procedures
Topics common to all the four textbooks were chosen for analysis. From these topics, a 200-word passage was selected and was digitalized verbatim in a word processor. The digitalized passages were copied and pasted on the websites of the online readability calculators. The readability values generated were used in the analysis.

Determination of readability features
Three online readability calculators were used in the analysis of the readability features. These are the Text Readability Consensus Calculator-TRCC (Readability Formulas, nd), the Free Readability Test Tool-FRTT, and the Analyze My Writing-AMW (AMW.com, nd).
The TRCC uses seven popular readability formulae to calculate the average grade level, reading age, and text difficulty of text, based on the number of syllables, words, and sentences in a sample. The formulae used by the TRCC are the Flesch Reading Ease, Gunning Fog, Flesch-Kincaid Grade Level, Smog Index, Coleman-Liau Index, Automated Readability Index, and Linsear Write formulae (Sibanda, 2014).
Meanwhile, the FRTT reports the results of six common readability formulas, including the complex word densities of printed texts. Hence, this online tool was also used to determine the complex word densities of the selected passages from the four textbooks.
As an alternative to the web-based readability calculators and analyzer utilities, two procedures were employed. The first procedure employed the Sonmez's formula (Sonmez, 2003, cited in Cardak et al., 2016 while the second utilized the cloze test method.
Before the Sonmez's comprehensibility and the cloze test procedures were employed, the pupils' parental consent was obtained through a request letter. The letter explained the purpose of the study, emphasizing that the activity is not a gauge of their children's abilities, but it is a measure of whether or not the textbooks "readable" to the intended users. Also, the parents were informed that the activities are to be accomplished anonymously.
Twelve third grade pupils from a private elementary school and 10 incoming third graders from a laboratory elementary school were asked to accomplish the Sonmez's comprehensibility and the cloze test. In the Sonmez's method, the third grade pupils were asked to read a passage selected from the four textbooks. They were then asked to encircle all the words that they cannot understand. The number of unknown words was determined and the mean scores were obtained. The Sonmez's comprehensibility of the text was then calculated using the following formulas:  The results were compared with Sonmez's table on comprehensibility, as reported by Cardak et al. (2016), and given in Table 2. The second cross-validating alternative was the cloze test method. A cloze test involves removing certain words in the text, then letting the readers replace the blank spaces of text with the missing words. A cloze test was developed by selecting a text passage from the four textbooks and deleting every fifth word in the selection to make a blank space. Scoring was based on the percentage of words that matches the original text exactly. Based on the scores of the readers, the reading levels of the text were interpreted, following the recommendations of Wellington and Osborne (2001), as cited in Al Qaydi (2015), as follows:

Content analysis procedures
3.2.2.1. Alignment with national science standards. A checklist containing the competencies expected for third grade pupils was prepared. These competencies are those specified in the national science standards by the Philippine Department of Education. Each of the four textbooks was rated based on its compliance with the standards. A scale of zero to two was used rate the textbooks, with 0-not covered, 1-partially covered, and 2-sufficiently covered. Three grade three science teachers, who were recommended by their respective school heads, and who have at least 10 years of teaching experience in the third grade, were invited to evaluate the four textbooks. Their scores were then consolidated to constitute an overall rating. In cases where the three evaluators did not agree with their ratings, the twoout-of-three rule was considered, or the evaluators discussed the ratings with each other and with this researcher, until consensus was reached.
The standards for third grade science in the Philippines are given as follows: At the end of Grade 3, learners can describe the functions of the different parts of the body and things that make up their surroundings-rocks and soil, plants and animals, the Sun, Moon and stars. They can also classify these things as solid, liquid or gas. They can describe how objects move and what makes them move. They can also identify sources and describe uses of light, heat, sound, and electricity.
Learners can describe changes in the conditions of their surroundings. These would lead learners to become more curious about their surroundings, appreciate nature, and practice health and safety measures (Department of Education, 2012).

Editing flaws and conceptual problems.
Editing flaws, such as errors in spelling, punctuations, syntax, etc., were identified in all chapters in the case textbooks. Meanwhile, misconceptions were identified from the selected chapters and were classified, based on the work of Dikmenli, Cardak, and Oztas (2009) as misidentifications, overgeneralizations, oversimplifications, obsolete concepts and terms, and under-generalizations.
3.2.2.3. Gender-sensitivity. The gender-sensitivity of the four textbooks was analyzed using the GB14 analysis tool, developed by Parkin and Mackenzie (2017). Permission has been granted by the authors for the use of this tool. The authors have chosen this analysis tool for its appropriateness to all levels and for its simplicity of use. This tool consists of 14 questions to be answered per chapter. A difference of 15 or more indicates significant gender bias. If the male score was higher than female, it was given a positive (+) result. If the female score was higher than the male score, it was given a negative (-) result. Genderness was interpreted using the continuum proposed by the tool developers. The interpretation of the genderness score, as proposed by Parkin and Mackenzie (2017), is shown as follows ( Figure 1):

Determination of lay-out and other mechanical features
Eleven grade-three teachers and three experts on children's curriculum materials were invited to evaluate the lay-out features of the four textbooks. The desired lay-out features were those identified in Trowbridge et al. (2000), and the ones proposed by the curriculum materials committee of the Philippine Department of Education. These desired features include key terms emphasized through bold letters, appropriate font size and font style, clear and colored illustrations, and text and diagram layout. All of these lay-out features can undermine the readability of the textbooks. Finally, as an added feature, the textbooks were also evaluated in terms of the quality of paper used, binding, and handiness.

Statistical treatment of data
For the readability analysis, Pearson correlation was used to associate the values obtained in the different parameters. Analysis of variance was used to determine significant differences in the four textbooks along the quantitative data gathered.

On-line readability procedures
Based on the results of the readability consensus, the four textbooks sampled are written for learners who are 2-3 years older and 3-4-years older than their intended users (Table 3 and  Table 4).  Comparing the age and grade level appropriateness of the four textbooks, Textbook A appears to be the most advanced, while Textbooks B and C are the least advanced. Analysis of Variance and Tukey's HSD reveals no significant differences in the four textbooks along age appropriateness, but significant differences were noted along grade level appropriateness.
Based on the analysis, the four textbooks are supposedly suited to readers up to 14 years old, or up to the ninth grade. This means that the most advanced passages are 6 years or grade levels higher than their intended users. The least complex passages are one grade level higher than the intended users, although one passage sampled from Textbook A is suited to learners who are oneand-a-half years younger than the target users.
The reading ease of the sampled passages from the four textbooks is shown in Figure 2.
As depicted in Figure 1, Textbook B has the most varied level of reading ease, ranging from "fairly difficult to read" to "very easy to read". In contrast, Textbook C has sections that are "fairly difficult to read", but none of the sampled sections are "very easy to read". The reading ease of Textbook D ranges from "fairly easy" to "easy", while Textbook A had "average/standard" to "fairly easy" reading levels.
In the explanation of Curtis & Hassan, cited in To, Fan, and Thomas (2013), written texts which are "fairly difficult to read" are suited for students from Grade 10 to Grade 12, and those described as "average" or "standard" are appropriate for those in Grades eight and nine. When texts are described as "fairly easy to ready", they are suited for Grade seven learners. Finally, texts described as "easy" and "very easy to read" are appropriate for learners in Grade six and Grades five, respectively.
These findings confirm the results depicted in Table 1, showing that the reading levels of the four textbooks are far higher than the reading levels of their intended users. Such findings are far from positive as the intended users are in the stage of transition from reading to read, to reading to learn. According to experts in curriculum materials development, textbooks must be written two Textbook B 8.5-14 8.5 10.34a 4-9 6 and 7 6.00b Textbook C 8.5-14 8.5 and 10.5 10.31a 4-9 6 6.00b Textbook D 8.5-14 8.5, 10,5 and 14 10.56a 5-9 5,6,and 7 6.19ab *Means of the same letter are not significantly different at p = .05, Tukey's HSD Figure 2. Reading ease of the samples from the four textbooks.
levels lower than intended users (Trowbridge et al., 2000). In the case of the intended users who are non-native speakers of English, reading science texts which are written far beyond their grade or age level must be frustrating.
Certain issues arise with use of readability formulas in determining readability levels of textbooks has done more harm than good (Armbuster, Osborn, & Davison, 1985). According to these authors, "the most popular readability formulas only use word difficulty and sentence length in determining readability levels. They fail to account other characteristics that effect comprehensionfor example, content difficulty and familiarity, organization of ideas, author style, page layout." (p. 18).
4.1.2. Comprehensibility levels of the textbooks 4.1.2.1. Sonmez's comprehensibility scores of the four textbooks. Some incoming third grade pupils were asked to examine some selected sections of each of the textbooks to find out whether or not these textbooks are comprehensible to them. Table 5 presents the result of the comprehensibility analysis.
The comprehensibility analysis of the four textbooks shows that the randomly selected passages came up with values that fall under the comprehensible dimension. This result implies that the third grade learners who will use the book can understand the contents of the textbooks with little or no help. Obviously, such findings did not conform to the earlier result of the readability formulas.
While the four textbooks were all comprehensible, it still remains as a challenge to the writers of these textbooks to use simpler synonyms to some of the words in order to elevate the books to the level that is clear and comprehensible, or even to complete communication. This is specially so as the textbooks are written for the first formal science course, and the users are transitioning from the stage of learning to read to reading to learn.

4.1.2.2.
Reading level based on cloze test. The result of the cloze test indicates that, based on six selected passages, the reading level of the four textbooks was all in the frustration level (Table 6). In all four textbooks, the average % word recognition in the cloze test was much lower than 40%, which is the cut-off score for the independent level. Such findings mean that the textbooks are difficult to read, even with teacher assistance. This result somehow supports the findings of Cardak et al. (2016), Al Qaydi (2015), and Yong (2010), who all reported that the reading levels of the science textbooks that they analyzed were mostly in the frustration level.

Alignment with the national standards for grade three science
The textbooks' degree of agreement with the national standards for third grade science is shown in Table 7. Table 7 shows Textbook B's perfect agreement with the national science standards for third grade pupils. All the learning competencies specified in the standards were targeted in this textbook. The textbooks chapters are in fact arranged according to the order of the competencies specified in the standard.
Textbook C also has a very high degree of alignment (r = .9545). In this textbook, a competency was addressed, but not adequately. Textbook C and D also followed the order of the competencies in the standards. The lowest degree of alignment was seen in Textbook A, even as the degree of alignment with standards is still high. It is noteworthy that Textbook A did not tackle the required competencies under heredity and variation, including the topic on ecosystems.
The findings on the very satisfactory to excellent alignment of the contents of the textbooks with national standards are good indications that the publishers of the four textbooks are cognizant of the importance of content alignment with standards. When a textbook's contents are not aligned with the standard, then the validity of such curriculum material is sacrificed.

Error and conceptual problems in the four textbooks
Textbook D has the greatest errors and conceptual problems, while textbook B has the least of these errors (Figure 3) (Figure 2). However, considering the average number of errors per page, the highest proportion was noted in Textbook A (0.181), followed by Textbook D (0.153). Textbook Table 7.  Most of the conceptual problems detected in the four textbooks were misidentifications (MI), representing 71.30% of the total conceptual problems in these materials. The greatest proportion of misidentifications were noted in Textbook A (81.25%) and fewest in Textbook B (50.00%).
The next dominant conceptual problems were over-generalizations (13.91%). Textbook B, in particular, had a relatively higher number of overgeneralizations than the rest of the textbooks.
The least prominent conceptual problems were oversimplifications, constituting only 2.61% of the conceptual errors detected. These oversimplifications involve the textbooks' presentation of the raw materials and products of photosynthesis using a single equation. Such simplified illustration can lead to learner misconception as they might think that photosynthesis is a simple process. Such error, incidentally, is a common in primary science textbooks in Turkey, as reported by Dikmenli et al. (2009).
Some of the common misidentifications in the four textbooks are presented in Table 8.
Some of the misidentifications detected were factual errors, such as misidentifying the nasal hairs as cilia (Textbook A and C). Other misidentifications include defining germination as the process by which the embryo grows from seeds (Textbook A), instead of the emergence of a certain length of the radicle. Also included in the list of misidentifications are illogical, incomplete, or erroneous explanations. Under this category of errors include the following: "People cause (instead of contribute to) erosion by rampant cutting down of trees or burning trees" (Textbook C), "Loam is sand and clay" (Textbook A), without emphasizing the presence of humus or rich organic materials. Included in this category is the statement that talks about the wrong function of a structure, "The skin hairs serve as a protection against harmful substances that may enter the body" (Textbook B). Finally, another observed misidentification was an incomplete definition of weather as "Weather is a condition of the atmosphere." The correct definition should have been like this: "Weather is the condition of the atmosphere over a given place at a certain period of time." Table 9 shows that out of the 20 chapter selected from the four textbooks, 75% were gender-fair, while the rest possessed low-level biases towards either males (15%) or females (10%). When data from the individual passages were combined, three textbooks emerged as gender-fair, while one textbook, Textbook C, showed some low-level male bias. All of the selected passages from Textbook B were found to be gender-fair. Textbook A has one female-biased chapter, while Textbook D had two chapters which were either male, or female-biased.

Gender issues in the four textbooks
In Textbook C, the low-level male biases were observed along the topics on Force and Motion, and on Energy. This result finds supports from gender-research literature in science, which states that males prefer science concepts related to the physical sciences while females are more interested about life science ideas (Trowbridge et al., 2000).

Teachers' and experts evaluation of the layout of the textbooks
The third grade teachers and the curriculum experts who evaluated the four textbooks indicated that these textbooks were all "very good" in terms of overall layout (Table 10). Despite the higher mean ratings for Textbook A and lower ratings for Textbooks B and D, ANOVA indicates that these ratings are statistically even at p = 0.05.   Textbook A was evaluated as "very good" in the three areas, with "outstanding" diagram color and clarity. However, the diagram color and clarity in the other textbooks were rated as "Good". All four textbooks were "very good" in emphasizing key terms and in the balance among diagrams and texts.
The teacher evaluators rated Textbook C "good" in terms of font size and line spacing, while all others were rated "very good" along this criterion. This could be explained by the huge discrepancy in the average character density per page in the four textbooks. On the average, Textbook C has 1,103.5 characters per page. To acquire such number, the font size and the line spacing in the text must be smaller. The character density value obtained from Textbook C was significantly higher than the averages in the three other textbooks, which are 603 for Textbook B, 609 in Textbook C, and 661 in Textbook A.
There is no published literature on the recommended character density per page, especially for science textbooks intended for K-3 learners. However, this initial findings could provide initial some information to textbook developers about the character density preferred by readers or their teachers.
In their paper on the evaluation of selected textbooks from Ghanaian primary schools, Essuman and Osei-Poku (2015) reported that the typeface, typestyle, and type sizes were all appropriate. Moreover, they reported on the line length and spacing within the body texts makes these texts legible and readable. Such features of the analyzed textbooks seem to be similar to the result of this present evaluation. However, the Ghanian primary textbooks were observed to have illustrations that are of inferior quality. Such findings are inconsistent with the findings of this study.
The science textbooks were also subjected to stakeholder evaluation on their mechanical features of paper quality and handiness. The result of the evaluation indicates that all four textbooks are "very good" in terms of paper quality. The same "very good" rating was bestowed in all four textbooks in terms of hardiness, although the highest rating was given to Textbook B. Textbook B is the smallest of the four textbooks, making it both light and handy. Such added feature is important for young learners who need to carry several textbooks every day.

Discussions
This paper evaluated four commercial science textbooks intended for third grade Filipino teachers. The textbooks were selected based on the popularity and reputations of the publishers. The textbooks features analyzed were readability, content, and mechanical features. To determine the readability levels of the textbooks, two approaches were employed, the readability formula method, and the user evaluation approach. In the latter approach, the comprehensibility and the cloze test procedures were used. Based on the readability results, all four textbooks are written 3-4 years, or grade levels higher than the intended users. The readability results imply that all four textbooks could have used in the text could have made use of words that are too advanced for the intended users, and/or the sentences used may be inappropriately long that makes the sampled passages difficult to read. Such results seem to indicate that the publishers and book writers either did not consider the age and developmental levels of third grade Filipino learners, or that they may have assumed that the users are academically gifted, as most learners from private-operated schools in urban Philippines. This result on reading levels far exceeding the age and grade levels of the intended users is similar to the findings of Yong (2010), andCardak et al. (2016) on science textbooks intended to users of various grade levels from various countries around the world. In Yong's study on a government-issued science textbook for seventh graders in Brunei, the sampled textbook has a reading level that is 2.3 years higher than its intended user. Similarly, the science textbooks for Ghanian secondary learners analyzed by Gyasi (2013) were evaluated to be very difficult to read and understand. The author then recommended that the Ghana Association of Science Teachers, the principal authors of the textbooks, to simplify their text materials to improve the textbooks' readership and understanding (p. 13).
The results of the two tests for comprehensibility were contrasting. While the Sonmez's procedure yielded a result indicating text comprehensibility with little or no help, the Cloze test result showed otherwise. The cloze test procedure yielded recognition rates that are indicative of frustration levels, i.e., the reading materials are difficult to the students.
Several comprehensibility studies using the cloze test also arrived at frustration levels, aside from Cardak et al. (2016). Wissing et al. (2016), for instance, reported that the introductory accountancy textbook used for undergraduate students in South Africa are in the frustration level. Such result arose from the Cloze test procedure employed. Similarly, Gyasi (2013), in his study on the comprehensibility of Ghanian science textbooks, found out that the biology, integrated science, and physics textbooks were considered in the frustration level by most of the intended users. The same result emerged from Yong's (2010) analysis on the readability and comprehensibility of the government-issued science textbook in Brunei.
The inconsistency in the result of the two comprehensibility techniques conforms to the findings of Cardak et al. (2016). Consistent with the result of this study, the team of Cardak et al. (2016) found that the readability levels of similar texts were higher when determined using the Sonmez's formula than when using the cloze test.
The discrepancy, somehow, can be explained by the different nature of the instruments used to measure the two constructs. While the comprehensibility study involved student crossing out the words they are not familiar with, the cloze test required the participants to supply the missing words. Obviously, a recognition exercise is easier than a supply test. Hence, scores in a supply response test, like a cloze test, tend to be lower. These results could provide considerations in the interpretation of the results of the study, as well as insights in future comprehensibility studies.
Of the readability and comprehensibility analysis procedures employed in this study, only the Sonmez's formula indicated a positive result. However, all other procedures say that the textbooks are not suited to the level of the intended users in terms of readability. All of these findings imply the inconsistencies in the results of the tools used to measure readability and comprehensibility. To come up with a more valid report on text readability and comprehensibility, it seems good advice for practitioners to consider using different procedures and cross-validate the results of each these techniques. This way, a more comprehensive and reliable result can be reported to book publishers and writers or to policy makers and/or curriculum planners for consideration.
Another concern of the study is content analysis, emphasizing on alignment with national science standards, errors and conceptual problems, and gender sensitivity. There seem to be a negligible content alignment problem in the textbooks, with one textbook covering all the required competencies in the case grade level. This indicates that the four textbooks give much premium on content alignment, considering it as one of the most important characteristics of textbooks. The publishers of the case textbooks constantly check on the alignment of the content with the national standards and even provide evaluation copies to many schools in the country.
The result on high to perfect content alignment contradicts the result of Saeed and Rashid (2014) in their report on Pakistani chemistry textbooks having content coverage issues and partial alignment with national standards. The result of this present study also differs from the findings of Polikoff (2015) of fourth grade mathematics textbooks' misalignment in some substantive areas, and of King (2010) who discovered limited coverage of earth science concepts in earth science textbooks in the United Kingdom. The authors, hence, recommend that book publishers should consider giving a higher premium on content alignment and appropriateness since many teachers rely on textbooks as the sole source of curriculum to be taught. With content poorly aligned with curriculum standards, many of the required competencies in a particular grade level may be not be covered in class, and students and teachers might be spending precious academic time with content that is unnecessary.
As to the analysis of errors and conceptual problems, it can be said that the four textbooks are error-light, since the average conceptual problem or error is one in every six to eight pages. This count is low compared to the unverified finding of 2.46 errors per page in government-issued science textbooks for fourth graders, as reported by Go (2018). Accordingly, the errors "range from conceptual, pedagogical, grammatical, and logical errors, including illustration errors"(p. 2).
Literature on conceptual problems and errors in science textbooks is scarce. One published paper examined misconceptions in 51 earth science textbooks used in England and Wales. Authored by King (2010), the study detected one earth science misconception per page. The misconceptions were particularly common in the areas of plate tectonics, sedimentary processes, earthquakes, and earth's structure.
In the present study, misidentifications surfaced as the most dominant error in the four textbooks. This result, incidentally, is similar to the analysis of Dikmenli et al. (2009) on the most common type of Biology-related errors in primary science and technology textbooks in Turkey. These misidentifications came in the form of facts, concepts, structures, labels, etc.
The misidentifications and other conceptual problems in textbooks can lead to learner misconceptions that might be difficult to eradicate. This is especially true for textbooks intended for the first formal course in science. As such, the science teacher must be cognizant of these misidentifications and offer correct explanations on these errors during their teaching.
Similarly, other conceptual problems in textbooks, such as overgeneralizations, oversimplifications, undergeneralizations, and obsolete ideas can lead to learner misconceptions. Many beginning textbooks are prone to these types of conceptual problems as they attempt to simplify content for young learners. Most of oversimplifications occur in the topic on photosynthesis and respiration, which incidentally, is similar to the findings of Dikmenli et al. (2009). This result could be explained by the complexity of the topic on photosynthesis and respiration, which requires deeper background knowledge about biochemistry and thermodynamics.
Oversimplifications tend to sacrifice the true and deep understanding of scientific concepts. This could lead to learners developing incomplete and fragmented conceptualizations of the natural world. It is then the responsibility of the science teacher to make necessary corrections, or to caution the learners of the simplifications made by the authors.
This study adds input to the limited research on error and conceptual analysis in textbooks. With the wide circulation and dissemination of this report, publishers and authors may consider subjecting their books to editing and proof-reading procedures in order to ensure the users of valid and credible source of information to learners.
As to the gender sensitivity of the four textbooks, some good results were obtained. Three of the textbooks are gender-fair, while one has some low-level male bias. Such result seems to indicate that the authors of the textbooks were cognizant of gender roles and are sensitive and careful of their depiction of sex stereotypes. The positive results could be attributed to the nation's aggressive campaign on gender equality and equity lately.
The gender-fair representations in the four textbooks contradicted the result of Parkin and Mackenzie (2017) who developed the GB14 instrument. Analyzing three Collins Key Stage 3 science textbooks, the authors discovered that in the three textbooks, there were more male images, more male role-models, more male pronouns, more male-gendered words, and more occasions where the "status" of the male was "improved" compared to the female. These tendencies of male biases also emerged in the works of Wu and Liu (2015) on English primary textbooks in China. Similarly, in their study on Indian social studies textbooks, Sumalatha and Ramakrishnaiah (2004) noted some various sex biases in the portrayal of men and women, or boys and girls. Esen (2007) also reported that the new textbooks prepared under the curriculum reform in Turkey are the carriers of the gender stereotypes just like the older edition textbooks. Sex bias and stereotypes were likewise detected in Iranian English textbooks by Gharbavi and Mousavi (2012), and so with the English textbooks used in Japanese senior high schools as reported by Ruddick (2014).
Some studies on the gender perspectives of Filipino textbooks gave results that are contradictory to the result of this present investigation. The content analyses of Manalo (2018), Tarrayo (2014), and Java and Parcon (2016), all indicated that males and females were depicted in the various Philippine textbooks in stereotypical roles.
The existence of gender biases in textbooks can undermine the attainment of gender equality, one of the cornerstone of both UNESCO's Millenium Development Goals (MDGs) and the goals of Education for All (EFA). Since textbooks and other learning materials play a key role in shaping the values, attitudes, and social skills of the learners, textbook writers, and editors must be more conscious on how they depict male and female roles, especially in illustrations. The most also strive to use gender-fair language in the text.
As to lay-out, printing, handiness, and other mechanical features, the four textbooks were evaluated as "very good." Print and lay-outs are important considerations in selecting textbooks because these features can affect the textbooks' readability and comprehensibility. The font size must be appropriately large for younger readers. Proper placement of supporting pictures or diagrams is also important as they aid in the comprehensibility of written text. The usefulness of these diagrams may dwindle if they are not placed appropriately.
It is clear from the result of all the analyses that no single book emerged superior than the others in all the aspects. Some textbooks are better aligned with the national standards but are inferior in terms of printing quality and lay-out. Another textbook has excellent print quality, but it could be laden with conceptual problems. Or one textbook may be high in terms of learner engagement, but it could be too crowded and complex to the prospective readers.
It seems logical therefore for teachers and policy makers not to prescribe a single textbook. Instead require multiple educational materials are prescribed as references. This way, the best features in one textbook, and which is absent in another, can be better exploited for the improvement of student outcomes. So far, the country's educational officials may have realized this as they no longer prescribe official textbooks in public schools. All they provide are learning materials and suggested references.

Conclusions
This study has added new information on the limited research on grade and age appropriateness of textbooks intended for ESL learners. The study is specifically interesting as it analyzed the science textbook of third graders, who are formally taking science for the first time. The study has highlighted the contrasting result of the two comprehensibility procedures, although such difference can be explained by the difference in the complexity of the tasks used in the two procedures. Meanwhile, all the readability formulas showed that the textbooks are advanced for the third grade learners, and are written in the frustration level of reading. These results indicate that the textbooks cannot be read by the third grade pupils independently. They need the assistance of a teacher or a more knowledgeable adult for them to understand the contents of the textbooks.
As to content, the four textbooks seem to possess the qualities of a good curriculum material, i.e., good alignment with national curriculum standards, few conceptual errors, and generally gender-fair. These are indications that the four textbooks employ stringent editing and proof-reading mechanisms. The tough competition among private textbook publishers in the Philippines might explain the satisfactory results in the content component. The lay-out, printing, handiness and other mechanical features are likewise rated as very satisfactory. Such satisfactory results can underscore the publishers' emphasis and consideration of improving the readability and comprehensibility of the textbooks.
While this paper may have important contributions on the readability and content of a Filipino textbook, its scope is quite limited. It only analyzed commercial textbooks that are used mostly by students in private-owned schools in the Philippines. It is then a worthwhile endeavor to conduct a similar analysis on the science learning materials that are prescribed for public schools by the Philippine government. Such studies could provide empirical data on the quality of learning materials used by the majority of the learners in the Philippines. Similar analysis may also be made on textbooks in other grade levels and in other learning areas. This way, stakeholders and policy makers can be informed on the quality of learning materials that the education department prescribes, and use these results as their bases in crafting more stringent textbook quality monitoring schemes. Moreover, results of textbook quality analysis must be used by writers and publishers as benchmarks and considerations as they evaluate and revise the textbooks.