Gender differences in literacy in PIAAC: do assessment features matter?

Background: Previous research based on large-scale studies consistently suggests that on average, male students tend to have lower literacy compared to their female students during secondary schooling. However, this gender gap in literacy seems to “disappear” during adulthood. Up until today, only a few studies investigated the role of assessment features in gender differences in literacy performance in adulthood. This study aims to understand the relationship between assessment features and gender differences in literacy skills. Methods: Using the German 2012 PIAAC data (N = 4,512), we applied item-level analyses using linear probability models to examine gender differences in the probability of solving a literacy item correctly with six assessment features including (1) text format, (2) text topics, (3) text length, (4) cognitive strategies, (5) numerical content of the text/questions, and (6) gender typicality of content. Results: We found that men had a 13.4% higher probability of solving items with a noncontinuous text format correctly than women. Men also had a 9.4% higher probability of solving short text items correctly and a 4.6% higher probability of solving items with a medium/high numerical content in the question correctly than women. There were small to negligible gender differences in literacy performance in terms of text topics, cognitive strategies, and gender typicality of content. Conclusions: Our findings highlight the role of text format, text length, and numerical content in gender differences in literacy skills, suggesting that further refining these practices can enhance the fairness and accuracy of literacy assessments. Specifically, we advocate for ongoing research aimed at understanding and minimizing the potential bias introduced by these assessment features. Such efforts are not only crucial for developing instruments that accurately measure literacy skills, but they also yield insights that hold significant implications for educational researchers and practitioners dedicated to creating more equitable assessment environments


Introduction
Literacy-the ability to understand, evaluate, use, and engage with written texts to participate in society, achieve one's goals, and develop one's knowledge and potential (OECD, 2013b)-is an essential skill to participate in our society.A great deal of information in everyday life, such as instructions on how to use technical devices Page 2 of 18 Miyamoto et al.Large-scale Assessments in Education (2024) 12:21 or medical prescriptions can only be adequately understood with sufficient levels of literacy.Individuals also need to be literate to understand official announcements (e.g., travel restrictions due to COVID-19) and obtain information for important policy interventions (e.g., an increase in tax payments).In addition, literacy is also necessary for filling out forms (e.g., for job applications), signing agreements (e.g., before getting the vaccine), and political participation (e.g., voting).Some of these may require a high level of reading and writing proficiencies.Moreover, social communication is increasingly taking place on social media platforms, such as Facebook or X, which requires people to read and understand various forms of online texts.Literacy skills are also a prerequisite for everyday activities that can bring personal fulfillment and increase satisfaction with life.(e.g., e-mails, blogs, chats; OECD, 2013b).Therefore, it is important for society to reduce inequality in literacy skills for all individuals regardless of social background.Gender differences in literacy skills among school students are one of the inequalities that are often discussed in educational contexts.Previous research consistently suggests that on average, male students tend to have substantially lower literacy compared to their female students during secondary schooling (e.g., see Lietz, 2006 for a metaanalysis; OECD, 2019).This gender gap in literacy seems to, however, "disappear" during adulthood.Previous studies find no gender differences in the average scores on literacy tests among adults across ages (e.g., Borgonovi et al., 2021;Borgonovi, 2022, Lechner et al., 2021;Solheim & Lundetrae, 2018).There are several possible explanations for the differential gender gaps in literacy across age cohorts.One is that gender differences in literacy skills tend to increase during adolescence and decrease (or disappear) in young adulthood due to gender-specific differences in development, maturation, or behaviors.For instance, male and female individuals may differ in the rate of language competence development or the frequency of reading behavior at different stages of life.Another possible explanation is that surveys targeting different age groups differ in their assessment features (e.g., text types, item formats) and some assessment features may be more relevant to gender differences in literacy than others.For instance, previous research suggests that girls generally have higher literacy for narrative and continuous texts, while boys have higher literacy for informational and non-continuous texts (see Solheim & Lundetrae, 2018 for a review).However, there is still a lack of empirical studies investigating the role of assessment features in gender differences in literacy during adulthood.
The present study aims to investigate the role of six assessment features in gender differences in literacy, namely, (1) text format, (2) text topic, (3) cognitive strategy (4) text length, (5) numerical content of the text, and (6) gender-typicality of the content, using a large-scale assessment of adults, while also accounting for literacy proficiency.A better understanding of the role of assessment features in gender differences in literacy will provide important insights into similarities and differences in men's and women's reading behavior and how they respond to literacy test items with certain assessment features.Such insights will be useful for educational researchers and practitioners to develop gender-fairer literacy assessment instruments and obtain more accurate and unbiased estimates of gender gaps in literacy skills across the life course.

Gender differences in literacy in large-scale assessments
According to a meta-analysis based on 139 large-scale assessments of secondary school students across various countries (Lietz, 2006), male students tend to score 0.19 standard deviations lower in literacy than female students.In addition, the results from the most recent Programme for International Student Assessment (PISA) study, the wellknown international student assessment, show that 15-year-old male students scored, on average, almost 30 points lower-that is, approximately one-third of a standard deviation lower-than female students across all participating countries in PISA 2018 (OECD, 2019).However, surprisingly, these gender differences in literacy among school students seem to become negligible in adulthood.The Programme for the International Assessment of Adult Competencies (PIAAC), finds no statistically significant gender differences in the average levels of literacy skills among adults (Borgonovi et al., 2021;Lechner et al., 2021;Solheim & Lundetrae, 2018); the same holds for the German National Educational Panel Study or NEPS (Lechner et al., 2021).
Solheim and Lundetrae (2018) investigated gender differences in three large-scale literacy assessments including the Progress in International Reading Literacy Study (PIRLS) (10-year-olds), PISA (15-year-olds), and PIAAC (sub-sample of 16-24-yearolds) based on three Nordic countries (Denmark, Finland, and Sweden).Their results showed that the degree of gender differences (in favor of girls over boys) varied from 10-year-olds (Cohen's d = 0.24) in PIRLS to 15-year-olds (Cohen's d = 0.49) in PISA, and to 16-24-year-olds (Cohen's d = 0.03) in PIAAC.Borgonovi et al. (2021) also investigated gender differences across the same data-PIRLS, PISA, and PIAAC-from twelve countries and found that the gender gap seems to appear as early as 9/10 years of age and increases towards 15/16 years of age, but it seems to subsequently decrease hereafter and become almost non-existent at the age of 26/27 years.
Moreover, a recent study by Borgonovi (2022) compared a total sample of 15/16 yearolds in PISA and a subsample of 16/17 year-olds in PIAAC to see whether there are differences in the gender gap in literacy across two datasets.The result showed that the gender gap in favor of females in the full PISA sample of 15/16 year-olds was approximately three times larger (Cohen's d = 0.347) than the gender gap in the PIAAC sub-sample of 16/17 year-olds (Cohen's d = 0.104).As it is developmentally unlikely that most adolescents decrease in literacy by two-thirds within a year or two, it is plausible to assume that other factors contribute to the differential results for gender gaps in literacy between PISA and PIAAC such as assessment features (e.g., scoring method, test length, mode of administration, test characteristics) (Borgonovi, 2022).

The role of assessment features in gender differences in literacy
According to the selectivity hypothesis (Meyers-Levy & Loken, 2015), male and female individuals tend to have different levels of information processing and therefore apply different strategies to engage with written texts.More specifically, in comparison to males, females generally process incoming information more thoroughly and have a lower threshold for understanding data.This makes females more likely to detect, further elaborate, and utilize information that is less readily available and more distantly relevant when making evaluations.On the other hand, males tend to be more selective in their data processing, and compared to females, they rely more on less effortful information processing or heuristics.Following the notion of this hypothesis, girls may understand texts that are long, continuous, and narrative better, while boys may understand texts that are short, non-continuous, and informational better (e.g., texts presented in diagrams, charts, tables, lists, or images that convey information in a more structured or segmented way).
In line with this assumption, a systematic review by Jabbar and Warraich (2022) also shows that girls generally prefer to read continuous and narrative texts (e.g., fiction), while boys prefer to read non-continuous and informational texts (e.g., instructions).Consequently, girls also have a higher reading achievement for narrative/continuous texts than for informational/non-continuous texts (e.g., Kirsch et al., 2002;Mullis et al., 2003Mullis et al., , 2007Mullis et al., , 2012;;OECD, 2010;Wagemaker et al., 1996).Furthermore, boys tend to do better on multiple-choice items than on constructed-response items requiring a written response (Lafontaine & Monseur, 2009;Roe & Taube, 2003;Routitsky & Turner, 2003;Schwabe et al., 2015) possibly because boys are more likely to skip answers for constructed-response items than for multiple-choice items (Solheim & Lundetrae, 2018).
Despite some empirical evidence for the role of assessment features in gender differences in literacy among school students, there is still a lack of studies on this topic especially for adults.One exception is a study by Thums et al. (2021) that investigated the role of text types (informational vs. narrative texts) in gender differences in literacy among adults using the National Educational Panel Study (NEPS).They hypothesized that men have higher literacy for informational texts (e.g., non-fiction texts) than women, while women have higher literacy for narrative texts (e.g., fiction) than men.The results of Thums et al. (2021) showed that men had slightly higher literacy scores for informational texts than women (Cohen's d = 0.14), although no gender differences in literacy scores were found for narrative texts (Thums et al., 2021).Another study by Thums et al. (2020) also used NEPS data to examine the role of gender-typicality of text content in gender differences in literacy among adults.As women and men have different profiles of interests, previous experiences, and subject knowledge, gender-typicality of text content was assumed to influence individuals' competency and familiarity in dealing with texts depending on their gender.This is in line with the socio-cultural theory suggesting that social expectations (e.g., gender roles) influence the behavior of men and women through social rewards and punishment for conforming or not conforming to the expectations (Meyers-Levy & Loken, 2015).However, the results of Thums et al. (2020) found no evidence that gender-typicality of text content (either with female or male typicality) contributes to gender differences in literacy among adults.

The present study
The goal of the present study is to investigate the relationship between various assessment features and gender differences in literacy test scores based on German data from the first cycle of PIAAC.We examine six assessment features of the PIAAC literacy items that were assumed to be relevant for gender differences in literacy test scores according to previous research.These assessment features are (1) text format, (2) text topic, (3) cognitive strategies, (4) text length, (5) numerical content of the text, and (6) gender-typicality of text content.
In line with the selectivity hypothesis and previous research (e.g., Thums et al., 2021), we expect that text format and text length will contribute to gender differences in literacy scores.More specifically, women tend to have higher literacy for continuous and longer texts, while men tend to have higher literacy for non-continuous and shorter texts.In addition, regarding cognitive strategies, also following the notion of the selectivity hypothesis, we expect that men tend to access and identify information in the text better than women, whereas women tend to interpret, evaluate, and reflect information in the text better.Regarding the numerical content of the text, we expect that men tend to have higher literacy for texts with higher numerical content than women in line with previous research showing gender differences in numeracy skills in favor of men (e.g., OECD, 2013a).For the gender typicality of text content, in line with the socio-cultural theory, we expect that men tend to have higher literacy for male-typical text content, whereas women tend to have higher literacy for female-typical text content.For text topics, we will not specify a hypothesis due to a lack of theoretical assumptions.

Data: PIAAC (Germany)
PIAAC is an international multi-cycle large-scale assessment initiated by the OECD.It measures the skills of the adult population (16-65 years) in the participating countries for the competency domains of literacy, numeracy, and problem-solving.All countries are required to implement probability-based samples.Thus, the results are representative of their target populations.The PIAAC interview was carried out face-to-face and consisted of an extensive background questionnaire administered by an interviewer and a cognitive assessment monitored by the interviewer (without any timing restriction).
The first cycle of PIAAC implemented a multistage adaptive assessment design as displayed in Fig. 1 (OECD, 2013c).Following this design, respondents only worked on a subset of the item pool from one or two domains (e.g., literacy and problem-solving).The assessment was computer-based per default, with an optional paper-based version for respondents who were unable or unwilling to do the computer-based assessment.A so-called core module (Computer-based Core 2 in Fig. 1), composed of six cognitive items of relatively low difficulty (three literacy and three numeracy items), was used for the subsequent routing; respondents who failed this core were not administered the main assessment but only completed reading components tasks, which are simple reading tasks for adults with low literacy skills.
The subsequent analyses are carried out using the German PIAAC data (Rammstedt et al., 2016) which was collected from 2011 to 2012 (for a description of the implementation of PIAAC in Germany, see Zabal et al., 2014).To allow for a more straightforward interpretation of results without having to consider mode effects, only the computer-based assessment data (right-hand side of Fig. 1) was included (sample size of 4512 out of the full sample of 5456 respondents).

Literacy skills
Literacy in PIAAC is defined as "understanding, evaluating, using and engaging with written texts to participate in society, to achieve one's goals, and to develop one's knowledge and potential" (Jones et al., 2009;OECD, 2013b).The literacy items include simpler tasks such as decoding written words and sentences but also comprehending, interpreting, and evaluating complex texts.The items do not involve writing.
As stated above, in the computer-based assessment, not all respondents worked on literacy items and those who did were not necessarily presented with the same items.Based on the survey design, each individual participating in PIAAC received a subset of items in at least one of the three skill domains.Two-thirds of the PIAAC respondents were administered the literacy assessment.Among those, each respondent was assigned to a subset of 20 literacy items selected from an item pool comprising a total of 58 literacy items.The allocation of assessment items was not completely random but depended on individual information obtained from the PIAAC background questionnaire (education and mother tongue), performance in the initial part of the skill assessment (core module), and a random element (for a detailed overview, see, OECD, 2013b).
Although not all respondents worked on the same items or items in all domains, the International PIAAC Consortium estimated their proficiency in each skill domain, using methodology based on item response theory (IRT; Yamamoto et al., 2013).In the population model, latent proficiency was considered as the dependent variable, and the observed item responses and background variables from the PIAAC questionnaire (e.g., education, age, employment status) were used as predictors of proficiency.To take measurement errors into account, an empirically derived distribution of proficiency values conditional on item response patterns and background variables was constructed for each respondent.Ten plausible values (PVs), which can be treated as multiple imputations, were randomly drawn from this posterior distribution for each respondent and in each domain.We use all 10 PVs for literacy when it comes to the empirical analyses in our paper.

Assessment features
Based on its definition of literacy for PIAAC, the PIAAC literacy framework expands on key elements of the literacy construct that were crucial to the development of the literacy assessment items and the subsequent description of the literacy proficiency levels (for detailed information, see Jones et al., 2009;OECD, 2013b).The following literacy assessment features were considered in our research: In addition to the above-mentioned assessment features, the authors reviewed all German PIAAC literacy items and created additional categorizations: (4) Text length (2 categories): Item text length was differentiated into (a) short texts (less than one page on the computer screen), and (b) long texts (more than one page on the computer screen).(5) Numerical content (2 categories) of the item stimulus and of the question itself: Numerical content of the stimulus (text/tables to which the questions refer) and of the questions were evaluated and classified as (a) none/low or (b) medium/high.None/low numerical content means that there are no or only a few numbers (in the stimulus or the question, respectively), and these are not key to the correct response.Medium/high numerical content in the stimulus means that at least half of the text contains numerical information, for example in a graph/diagram.Medium/high numerical content in the question means that the question itself contains numerical information or that the response to the question is a numerical entity.(6) Gender-typicality of content (3 categories) to the item stimulus and to the question itself: Gender-stereotypical content of the text and questions were evaluated as (a) male-typical, (b) female-typical, or (c) gender-neutral depending on the gender stereotypes in the text content.For example, an item on childcare is considered female-typical, while an item on car mechanics is considered male-typical.There were no questions with female-typical content, so this feature is not included in the analyses.
Figure 2 shows an example of a PIAAC 2012 literacy item (international English source version) and its assessment features.Table 3 in Appendix A shows the distribution of assessment features across the literacy items in our sample.

Statistical analyses
The goal of our analyses is to examine whether there is an interaction between gender and different assessment features of literacy items.We employed an item-level linear probability model to analyze the PIAAC 2012 data, a choice informed by the unique structure and characteristics of the dataset.The dependent variable in our models is a binary variable indicating whether an individual solved a literacy item correctly (1 = correct, 0 = incorrect)."Incorrect" comprises a wrong answer or no answer at all.The main independent variable in our analyses is a binary indicator of gender (1 = female, 0 = male).As linear probability models and average marginal effects obtained from logistic regressions deliver practically identical results, the coefficients estimated in our models can be interpreted as the probability of solving an item correctly for women Fig. 2 Sample item of PIAAC 2012 literacy item and its assessment features.Adapted from https:// www.oecd.org/ skills/ piaac/ Liter acy% 20Sam ple% 20Ite ms.pdf compared to men.A unique feature of PIAAC is the adaptive design when it comes to the assessment of skills.Due to the adaptive design, different adults work on different items, i.e., adults who worked on different items are not necessarily comparable (for more information see "Measures" and Appendix B).To account for this selection effects, we control for literacy skills in all our models by including the ten PVs on literacy provided by the international PIAAC Consortium (see " Measures" and Yamamoto et al., 2013).This enables us to look at the difference in performance on the item once we have controlled for individual proficiency.We perform separate analyses for each of the text features in the literacy assessment, e.g., concerning text length across all long or short items.In this way, we aim to capture (the size of ) the relationship between gender and the respective assessment features.To account for the complex sampling design of PIAAC, we used sampling and replicate weights in all our analyses.

Descriptive statistics
Table 1 shows the distribution of key sociodemographic characteristics in our sample for men (n = 2272) and women (n = 2240).
Men in our sample score slightly higher on literacy skills compared to women (278 versus 275 points) on a scale ranging from 0 to 500 points.Moreover, men have a slightly

Table 1 Sample characteristics and descriptive statistics
Literacy skills are measured on a scale ranging from 0 to 500 points (for more information see "Measures" or OECD, 2013b) a Low education refers to respondents with less than upper secondary education, medium education refers to respondents who completed upper secondary education and high education refers to tertiary education b Individuals are considered native speakers if their first language is German c Individuals are "employed" if they have a paid job of at least one hour in the week preceding the interview d The table shows the frequency of respondents' reading activities at work or in everyday life.Answers were given on a fivepoint scale from 1 (never) to 5 (every day) higher share of high (tertiary) education and a slightly higher share of low education (less than upper secondary education) compared to women.While 75% of men are employed, only 55% of women are employed.Concerning age, first language, and children younger than 12, there are only minor differences between the men and women in our sample.
The lower part of the table also shows gender differences in reading behavior.Overall, there are only a few differences between men and women.Men are slightly more likely to read instructions or journals and publications.Women read books more often than men.

Gender differences in literacy items with different assessment features
Table 2 show the results from our linear probability models.We include interactions between the gender dummy variable and the different categories of the assessment features to infer the gender differences on items with various assessment features.
The results by text format show that men had a 13.4 percentage points higher probability of solving items with a non-continuous text format correctly than women.There were no significant gender differences in the probability of solving items with other text formats (continuous, mixed, or multiple texts).The results by text topics show that

Table 2 Gender differences in literacy by assessment features
The coefficients reported in column (3) are the coefficients from a linear probability model predicting proportion correct on the literacy items.Analyses are run separately for each assessment feature.All models control for proficiency in literacy by including the ten plausible values.Sample weights are applied, and the complex sampling design is taken into account.men had a 3.4 percentage points higher probability of solving items on education topics correctly.There were no significant gender differences in the probability of solving items with other text topics (community, personal, or work).Men also performed slightly better on "integrate and interpret" items than women (but only by 1.5 percentage points), while our results show no significant gender differences in the probability of solving items with other cognitive strategies (access or evaluate).Looking at performance by text length, we find that men have a 9.4 percentage points higher probability of solving short text items correctly than women.With regard to long text items, there were no significant gender differences.Our results show that men have a 4.6 percentage points higher probability of solving items with a medium/high numerical content in the questions correctly than women.However, there were no gender differences with regard to the numerical content of the stimulus.Our results concerning the amount of gender-typicality of content in the stimulus and questions of items did not show any significant gender differences.

Discussion
The goal of the present study was to investigate the relationship between various assessment features and gender differences in literacy test scores in a sample of adults in Germany.Our study contributed to the body of literature on gender differences in literacy assessments in several ways.Our findings suggest that not all, but some assessment features, especially, text format, text length, and numerical content of text in questions seem to be relevant for gender differences in literacy performance.In line with our hypothesis, our results indicate that men tend to perform better on literacy items with short and non-continuous texts than women.This result is consistent with previous research showing that men prefer to read for information and knowledge acquisition, while women prefer to read for entertainment (Groeben, 2004) and that men tend to have higher literacy scores on informational texts than women (Thums et al., 2021).
Our findings are also in line with the selectivity hypothesis that individuals of different genders exhibit varying degrees of information processing, leading them to employ distinct strategies when interacting with written materials.In particular, compared to their male counterparts, females generally exhibit a more comprehensive processing of incoming data and a lower threshold for grasping information.This predisposes females to detect, further analyze, and use less accessible and more remotely relevant information during evaluations.Conversely, males are inclined to be more selective in their data processing and depend more on simplified information processing methods or heuristics than females (Meyers-Levy & Loken, 2015).
Furthermore, in line with our hypothesis, our findings also indicate that men tend to score higher on literacy items with higher numerical content of text (in questions) compared to women.This is also in line with previous research showing gender differences in numeracy skills in favor of men (e.g., OECD, 2013a).Moreover, text topics, gender typicality of the content, and cognitive strategies did not have large contributions (less than five percentage points) to gender differences in literacy scores.One possible reason for this finding may be because the literacy items in PIAAC were carefully developed to be appropriate for different subgroups in the adult population not only across genders but also across cultures and languages.Therefore, most texts used in the assessment are not gender typical and they cover a wide range of topics that are relevant to individuals from diverse backgrounds.
Our findings also suggest important practical implications for educational practitioners, institutions, and policy-makers.Our analysis contributes to the understanding of how assessment features, such as text length (short), text format (noncontinuous), and the presence of numerical content in questions, contribute to some of the gender differences in literacy performance.While it is standard practice to include a diversity of test items and content coverage in literacy assessments, our findings highlight the critical need for continuous evaluation and refinement of these practices.This approach ensures that assessments not only cater to a broad range of literacy skills but also minimize potential biases that could skew the understanding of gender differences in literacy skills.
By emphasizing the need for ongoing research into the specific role of assessment characteristics in literacy performance, we aim to support educational practitioners and test developers in enhancing the fairness and accuracy of literacy assessments.Furthermore, our study underscores the importance for schools and policymakers to adopt a nuanced interpretation of gender gaps in literacy, informed by an awareness of how assessment design can influence these disparities.Ultimately, such informed approaches can guide more effective interventions and policies toward achieving gender equality in literacy across educational settings and societies.

Limitations and future directions of the present study
Despite the contributions and implications mentioned above, our study has several limitations which may be addressed in future research.First, the methodological challenges posed by the PIAAC's design necessitated a thoughtful approach to our analysis, leading us to employ item-level linear probability models.This decision, while diverging from more conventional methods commonly used in research on group invariance, differential item functioning (DIF), and item bias, was instrumental in navigating the complexities inherent in the PIAAC dataset.Our chosen methodology allowed us to analyze the interplay between assessment features, gender, and literacy performance despite these constraints.Acknowledging the limitations of our approach, we suggest that future research explore alternative statistical models and methodologies capable of addressing the nuanced challenges presented by datasets like PIAAC.
Second, our study is based on a sample of adults in Germany; therefore, our results may not be generalizable to all other countries and languages.For example, the role of gender-typicality of topics in gender differences in literacy performance may be more pronounced in other countries that value more traditional gender roles in society compared to Germany.Future research may replicate our analyses for other countries to cross-validate our findings across cultures and languages, possibly using the international PIAAC data.Third, the literacy test in the PIAAC does not contain any test items that require a written response, whereas other large-scale literacy assessments such as PISA do.Although constructed-response items significantly extend the measurement scope and its ecological validity, they are more difficult to implement in cross-national large-scale assessments.Previous research suggests that boys tend to do better on multiple-choice items than on constructed-response items requiring a written response (Lafontaine & Monseur, 2009;Roe & Taube, 2003;Routitsky & Turner, 2003;Schwabe et al., 2015) because boys may be more likely to skip answers for constructed-response items than for multiple-choice items (Solheim & Lundetrae, 2018).Such findings have not yet been replicated for a sample of adult readers.Future studies may investigate the role of response formats (constructed vs. multiple-choice) in gender differences in literacy test scores.
Finally, our results showed that men tend to perform significantly better for test items in short and non-continuous texts compared to women.According to Solheim and Lundetrae (2018), the literacy assessment in PIAAC 2012 contains an equal number of continuous and non-continuous texts.On the other hand, the literacy tests from PIRLS 2011 and PISA 2012 contain more continuous texts than non-continuous texts.This finding suggests that large-scale literacy assessments from school students (e.g., PIRLS and PISA) often focus more on literacy in continuous texts than literacy in non-continuous texts compared to large-scale assessments from adults (e.g., PIAAC).This may be one possible reason why the gender gaps (in favor of female participants) are often much larger in childhood and adolescence than in adulthood.It is beyond the scope of the present study to examine the question of whether and how assessment features play a role in explaining differential findings on gender gaps in literacy across age cohorts.However, we want to encourage future research to further investigate this topic to better understand the development of gender gaps in literacy over the life course.

Conclusions
The present study investigated the role of various assessment features in gender differences in literacy performance in a large sample of adults using the German PIAAC data from 2012.Our findings suggest that gender differences in literacy assessment seem to partly depend on the types of assessment features.Text format, text length, and numerical content of text (in questions) seem to matter for gender differences in a literacy assessment, whereas text topics, cognitive strategies, and gender-typicality of texts seem to matter less.Our study underlines the importance of incorporating a comprehensive variety of test items and content in literacy assessments, ensuring that they are accessible and equitable for all test-takers.Educational researchers and practitioners must remain vigilant about the contributions of assessment features to gender differences in literacy performance.Emphasizing continuous research and refinement of these assessments will help minimize gender biases and improve the accuracy of literacy evaluations.

B: Reshaping the data
A special characteristic of the skill assessment in PIAAC is that not all respondents worked on the same literacy items.Instead, they were routed to different sets of items with varying difficulty, depending on individual information obtained from the PIAAC  background questionnaire (education and mother tongue), performance in the first module (core) of the skill assessment, and a random element (for a detailed overview, see, OECD, 2013b).Thus, the data available at the respondent level contains missing values for some individuals on some items.Table 4 provides a short example of how the PIAAC data structure could look in a data set with six individuals for whom we have information on gender and who each worked on two out of four items.
Given this assessment design, we cannot simply compare the share of men and women who solved a particular item correctly because only a specific group of men or women worked on an item and the routing to the items is not completely random.For this reason, we have reshaped the data to be available at the item-person level rather than at the person level.To continue with our example above, consider that the four items differ in three dimensions of text characteristics: strategy, context, and format (Table 5).
We combine individual and item information and reshape the data such that individuals are "nested" in items (similar to country-level analysis, where individuals are nested in countries) (Table 6).
After reshaping the data from person-level to item-person-level, we can run analyses for the full sample using performance on certain literacy items (the variable "correct" in our example) even though some "individual-item" combinations do not exist as we still have a large number of individuals per item.We account for selection by controlling for proficiency in literacy, measured with 10 plausible values on a continuous scale ranging from 0 to 500 points (for more information on the test scores and plausible values, see OECD, 2013b.)

Fig. 1
Fig. 1 Scheme of the PIAAC interview workflow.Figure shows a schematic workflow of the PIAAC 2012 interview adapted from OECD (2013c)

( 1 )
Text format (4 categories): PIAAC differentiates four text formats: (a) continuous texts that are made up of sentences and paragraphs, (b) non-continuous texts that may be organized in a matrix format or around typographic features, (c) mixed texts that involve combinations of continuous and non-continuous elements, (d) multiple texts that consist of juxtaposing or linking independently generated texts, such as an e-mail exchange.(2) Text topics (4 categories): PIAAC differentiates four text topics: (a) texts related to work and occupation (e.g., about job search or wages), (b) texts related to personal topics (e.g., home, family, or health,) (c) texts related to society and community (e.g., about public services or community activities), and (d) texts related to education and training (e.g., learning opportunities for adults).(3) Cognitive strategy (3 categories): PIAAC differentiates three cognitive strategies that characterize literacy tasks: (a) accessing and identifying (e.g., locating information in a text), (b) integrating and interpreting (e.g., understanding the relationship between different parts of a text), (c) evaluating and reflecting (e.g., relating the information in the text to external knowledge).

Table 3
Distribution of assessment features across the literacy items (%)

Table 4
Sample data set at the person level

Table 5
Sample item characteristics

Table 6
Sample data set at the item-person-level