Subject-specific strength and weaknesses of fourth-grade students in Europe: a comparative latent profile analysis of multidimensional proficiency patterns based on PIRLS/TIMSS combined 2011

In 2011 the Progress in International Reading Literacy Study (PIRLS) and the Trends in International Mathematics and Science Study (TIMSS) were conducted at fourth grade in a number of participating countries with a shared representative sample. In this article we investigate whether there are multidimensional proficiency patterns across the competency domains or not. In order to derive proficiency patterns across the reading (PIRLS), mathematics and science (TIMSS) competence domains, latent profile analyses (LPA) of students’ plausible values were conducted. For this, the grade four student sample from 17 countries were combined and analyzed. The international reference model that resulted from this analysis was then applied with constraints to all 17 countries separately so that substantial comparisons between countries became possible. To describe and compare the differences between national profiles a classification system was developed and applied to all countries’ profile patterns. As a result of these international LPA seven groups of learners were identified. The profiles were approximately equidistant and parallel. For all countries we find that achievement across domains can be explained by a general level of achievement rather than subject-specific strengths or weaknesses of learners. However, subject-specific strengths and weaknesses can be identified but are—with the exception of Malta and Northern Ireland—for most of the countries rather small. For only about half of the countries, a rather uniform pattern of subject-specific strengths and weaknesses can be found on all competence levels. The subject itself varies between countries. In the other countries high, intermediate and low achievers differ in their relative subject-specific strength and weaknesses. The results suggest that differences in average achievement in TIMSS and PIRLS should also on country level be interpreted with caution. International comparative studies should further investigate potential reasons for the differences between countries.


Background
Results in reports of national and international large-scale assessment studies are usually contextualized with countries' achievement in specific domains compared against international benchmarks (Bos et al. 2012a, b;Martin and Mullis 2013). Yet, rough comparisons-for example, which country has the highest proportion of students meeting the highest benchmark in mathematics or science or reading, or which country ranks below the international average in all three subjects-can often be insensitive to the texture of within-country, and even between-country variation in achievement. For example, how is achievement in one domain related to another? Do the under-achieving students have specific strengths compared to high achievers? Do higher-achieving or lower-achieving students show similar patterns of strengths or weaknesses across countries? With this paper we aim to describe and compare achievement of Grade 4 learners between countries and across domain specific achievement by modelling multidimensional proficiency profiles making use of the combined datasets of the Progress in International Reading Literacy Study (PIRLS) and the Trends in International Mathematics and Science Study (TIMSS) 2011.

A comprehensive perspective on outcomes
Students' academic competencies are the results of a complex interplay between factors located at the country, the school, the classroom, and the student level. A lot of these factors shape inter-individual differences in student achievement in a rather similar way (Bergold et al. 2016). This might be a possible explanation for the finding that students in countries reaching certain levels in one achievement domain tend to reach similar results also in other domains (e.g., Mullis et al. 2012a, b).
Further, approaches to "interdisciplinary" or cross-subject teaching (Petersen 2000;Gudjons and Traub 2012) emphasize that a student understands his or her reality not contained within the bounds of one subject or another but, instead, across a set of applicable, interdisciplinary constructs (Lenzen 1996;Jonen and Jung 2007). This idea confirms the psychological perspective, which suggests that the development in competence domains is based on transferrable skills and dispositions common to all subjects (Weinert 1999). From this perspective, an individual's ability to learn is understood as a domain-general competence, which brings together in tandem a set of cognitive skills and strategies as well as prior knowledge and abilities. Regardless which perspective is taken, if learning occurs across domains (Baumert et al. 2000), it can be assumed that primary school students' acquisition of skills in reading, mathematics, and science develops at least partly parallel to one another (see Schurig et al. 2015).
Empirical research results support the assumption that learning in one subject is related to learning in another, and hence multidimensional, mutually dependent distributions achievement patterns should be found. For example, the reading, mathematics, and science scores from PISA were found to inter-correlate from r = .75 to r = .81 (Reilly 2012) or even from r = .95 to r = .99 (Rindermann 2007). In TIMSS and PIRLS 2011, for the German sample the correlations were lower but still substantial, ranging from r = .54 to r = .74 (Bos et al. 2012a).

Table 1 Percentages (SE) of 4th-grade students meeting high, intermediate, and low benchmarks and national means and standard deviations by country
a All numbers from Mullis (2013) b All numbers from Mullis et al. (2012b) c All numbers from Mullis et al. (2012a) What these correlations suggest is that high or low achievement in one subject is highly related with high or low achievement in the others. It also gives evidence for further investigation of achievement as a multidimensional construct. Thus the question arises how such relationships can be modeled and further investigated in an international comparative context.

Profiles of country specific strength and weaknesses in student achievement
Many studies making using from international large-scale assessment have investigated and established well defined profiles of students within one specific content domain or subject (Lie and Roe 2003;Grønmo et al. 2004;Olsen 2005;Angell et al. 2006). These studies show that the investigation of proficiency profiles enhances the knowledge of differences between countries and therefor provide valuable knowledge for the comparisons of education systems. For example, Olsen (2005) models profiles to study crosscountry variations in the response patterns in mathematics achievement for students participating in PISA 2003. With this approach Olsen was able to distinguish a so-called Nordic profile from a profile for English-speaking countries. For science Olsen (2005) found evidence for a North-West European profile using science cognitive items from PISA 2003.
So far, in all of these studies cluster analysis were performed based on the manifest item scores to derive domain-specific cross-country differences in so-called item specific strengths and weakness within a single achievement domain. It is far beyond the scope of this paper to discuss the methodical challenges that arise when only the item responses of the students are used as the dependent variable instead of the plausible values (and vise versa). However it seems worth to study the question which profile solutions result when the plausible values instead of the item responses are used as the dependent variable. Also, for investigations of profiles across achievement domains the multidimensional nature of the competencies domains should be taken into account when a profile solution is derived because the dimensions are usually highly correlated (see above).
Using an approach based on the plausible values of the students for the PIRLS/TIMSS 2011 combined sample, Mullis (2013) calculates the proportion of 4th-grade students who meet low and high international benchmarks in one, two, or all three of the competence domains by country. Mullis notes that, on the one hand, countries that have large proportions of 4th-graders meeting the high international benchmark in all three domains are best equipping their students for later learning. On the other hand, if any given country has a relative small proportion of students meeting the low international benchmark in any given subject, these would be the students who are considered "left behind" to the extent that schools are failing to teach them the skills that support later achievement. Combined with the information given in the International PIRLS/TIMSS-Reports Mullis et al. 2012a, b; see Table 1) on mean achievement and proportions of students reaching different International Benchmarks, it can be found that in ten of the 17 EU-member countries participating in this study (Austria, Czech Republic, Finland, Hungary, Italy, Poland, Slovak Republic, Slovenia, Spain and Sweden), students seem to show a consistent subject-specific weakness in mathematics.
In contrast, in Ireland, Lithuania, Malta, Northern Ireland and Portugal, students show a consistent subject-specific weakness in science. For Germany, a relative strength in reading is reported and for Romania a relative strength in science. However, taking into account the differences in standard deviations for countries, it remains unclear to what extent these mean findings can be generalized to the different groups of low or high achievers within the countries.
While a step in the direction of describing how achievement looks across all domains and determining which countries are faring well at preparing the majority of their pupils for more advanced learning, latent profile analyses (LPA) offer a different methodological approach that allows for the classification of cross-domain or multidimensional proficiency profiles within and across countries. In other words, latent profile analyses of achievement allow to classify groups of students beyond those who are "well equipped" or those "left behind. " These profiles, in turn, can be not only evaluated in terms of the quality of the selected classifications but also analyzed according to their subject-specific strengths and weaknesses as well as according to their relationships with social background characteristics (Schurig et al. 2015).
To this end, Bos et al. (2012a, b) model latent profiles of student achievement in reading, mathematics, and science for 4th-grade students in Germany. The authors found seven invariantly rank-ordered profiles based on the similarity of students' cross-domain achievement and then further studied relationships with background variables, such as gender, cultural and socioeconomic characteristics (Bos et al. 2012a, b). Gender differences only occurred in the group of the very high achieving students, as significantly more boys than girls were found in this group. In terms of cultural and socioeconomic characteristics, all seven proficiency profiles were distinguishable, and a close relationship between achievement and socioeconomic background was observed. In other words, at least in Germany, lower-achieving students across all domains look different in their social background from higher-achieving students.
In this paper we apply this same procedure to 17 European countries that participated in TIMSS/PIRLS combinded in a pooled sample of 4th-graders. We identify proficiency profiles across the countries. We then fit this international European model with constraints to all 17 countries separately to compare the multidimensional proficiency profiles between countries. We then analyze the derived profiles in more detail to identify subject-specific strengths and weaknesses within and across profiles. These country-specific patterns are then compared across countries and the relationships to well-known results of educational research are shown. Hence, this paper aims to answer the following main questions: (1) How many profiles describing achievement patterns of students in the participating EU countries can be separated? (2) What is the distribution of learners assigned to these profiles? (3) How are students distributed across the profiles with regard to their background characteristics, such as parent educational background, language spoken at home, positive attitude towards learning and domain-specific selfconcepts? (4) How are high-and low-achieving students in different European countries distributed? (5) Are there substantial differences in the multidimensional proficiency profiles across countries? (6) Can differences in the domain-specific strengths and weaknesses between countries be observed?  (Foy 2013) for which achievement measures are obtained using a multidimensional scaling approach combining all three subjects. Achievement means therefore differ from those in the TIMSS and PIRLS international report (shown in Table 1

Sample
We used representative data from the Progress in International Reading Literacy Study (PIRLS) 2011 and the Trends in International Mathematics and Science Study (TIMSS) 2011. Both studies are usually conducted independently of each other, executed in different cycles and focusing on different outcomes. In 2011, however, the studies were in some countries conducted mutually, testing students in mathematics, science, and reading.
Overall, 37 countries and benchmark participants participated in PIRLS/TIMSS combined with samples of 4th graders. We used the data from k = 17 countries that both took part in TIMSS/PIRLS 2011 combined and were members of the European Union when data collection took place (i.e., Austria, Czech Republic, Finland, Germany, Hungary, Ireland, Italy, Lithuania, Malta, Northern Ireland, Poland, Portugal, Romania, Slovak Republic, Slovenia, Spain, and Sweden) as these countries follow a common strategic educational framework (Education and Training 2020 program; European Commission 2015). The 17 samples resulted in a total sample of n = 74,868 4th grade students from 2704 schools. Due to the sampling procedure implemented in PIRLS and TIMSS, it was ensured that the students in the different countries were all comparable regarding their age and their amount of schooling. All countries applied a strict sampling procedure to allow analyses of nationally representative data. Sample selection was strictly monitored in every country to preserve high quality sampling standards (for more details regarding the sampling procedure see .

Data
The combined international data sets of fourth grade students of all countries that participated in TIMSS/PIRLS 2011 are used. 1 From these data sets, the country-specific data files ASG***B1 and ASH***B1 are used, where *** stands for a country specific code, ASG are the student background data files, and ASH are the corresponding home background data files. These data files are first merged across data file sources and then combined across countries. For the multidimensional proficiency profiles of the students from the 17 countries, the plausible values in reading achievement, mathematic achievement and science achievement (which are part of the ASG***B1 data sets) are analyzed. This scaling procedure differs from the one used for the international reports (Mullis et al. 2012a, b;) as it preserves the correlational structure across the three subjects (multi dimensional model: for a detailed description of the scaling procedure and the use of plausible values, see Foy 2013). Each achievement scale is set to its own metric with an international mean of 500 and a standard deviation of 100. The distribution of the students across countries together with sample statistics about characteristics of the students is shown in Table 2.

Latent profile models
For deriving the multidimensional proficiency profiles of the fourth grade students, latent profile analyses were conducted (Lazarsfeld and Henry 1968). In this method, the multidimensional marginal distribution of achievement values of all students is separated into distinct conditional distributions, which are assumed to mix up to the marginal distribution (thus this procedure is also known as mixture model). The distinct conditional distributions are called latent profiles, and these are normally characterized by their conditional means on the dimensions of the multidimensional distribution and by the percentage of students forming the profiles. However, as they are (at least theoretically) infinite with many possible separations of the marginal distribution into conditional distributions, and because the (marginal as well as conditional) distributions are probabilistic in nature, the derived number of latent profiles and the assignment of the students to these profiles are probabilistic as well. Thus, for judging which of these different possible separations of the marginal distribution into mixture distributions should be chosen, various fit criteria are available. In this study, the Akaike information criterion (AIC; Bozdogan 1987), and the sample-size adjusted Bayesian information criterion (CBIC; Schwarz 1978) are used. Smaller values of these criteria indicating a better fit of the assumed mixture. In addition, the classification error rate, entropie and pseudoreliability for the final solution are calculated.
Using latent profile analysis, a four-step approach is established. In the first step, the total sample of students (n = 74,868) were separated into two to eight mixtures and the resulting seven different mixture distributions are compared based on the fit criteria (Model 1; international model). After deciding for a concrete mixture out of these seven mixtures, the conditional means of the profiles are calculated.
In the second step, these conditional means are introduced as fixed parameters in national latent profile analysis. That is, the country specific multidimensional marginal distribution of fourth grade student's achievement values are separated into mixture distributions, in which the number of mixtures and the means of the mixtures are fixed at the values from the international model (Model 2; country specific models with fixed means). Thus, how well the international model fits the national distributions could be assessed, and the probability distribution of the students across the profiles (that is, the percentage of students forming the profiles) could be estimated.
In the third step, the national models are again fitted, but this time the means of the conditional distributions are free parameters (Model 3; country specific models with free means). This allows for different profile patterns across countries.
In the final step, the so-called validation step, Model 1 was again fitted, but this time fixing the number of profiles and the means on the results of Model 1 and with inclusion of students' background characteristics. Hence, the relationship between the derived international profiles and the background characteristics of the students could be estimated and compared.

Indicators for subject-specific strengths and weaknesses
The means of the profiles from Model 1 and Model 3 are used to derive four different indicators of subject-specific strengths and weaknesses. For the first indictor, three comparisons of the conditional achievement mean values given profile X (X = 1, …, k; k number of assumed profiles) are performed for each model (and country) separately. The first comparison involves the difference between reading and mathematics achievement, the second comparison involves the difference between reading and science achievement, and the third comparison involves the difference between mathematics and science achievement.
A subject-specific strength or weakness for subject Y in profile X is designated if the respective comparison results in a difference from at least 10 points, which is the average standard error of the conditional mean across profiles and achievement domain given Model 1. Because there are three competence domains from which none could be the strongest, or from which only one could be the strongest, or in ordered pairs or ordered triples the strongest and the weakest, the procedure for designating subject-specific strengths and weaknesses could result in 16 possible subject-specific achievement combinations or patterns within profiles, from which 13 combinations convey different meanings: ( The second indictor (which we term, "mean within profile heterogeneity") is the average difference between achievement domains (given profile X) across profiles. This indicator could be interpreted as the average heterogeneity of achievement across subjects and profiles. Hence, a high value on the mean within profile heterogeneity indicating a large subject-specific strength or weakness, a value of zero indicates no measureable subject-specific strength or weakness.
For the third and fourth indicators, the difference between the mean of the highest profile and the lowest profile for each domain is calculated. To calculate the third indicator (mean-between profile variance), these three differences are averaged. For the fourth indicator (mean-between profile heterogeneity), the average absolute differences between this means (e.g. average deviation) is estimated. Hence, the third indicator designated the overall distance between the lowest and highest profile across domains. A value of zero indicates perfect overlapping of the profiles. The fourth indicator could be interpreted as the degree to which the subject-specific differences across profiles are constant across domains. A value of zero on this indicator indicates that there are no subject-specific profile differences.

Analytic strategy
Because plausible values for the achievement domains are used, all analyses in this study were performed five times (for each plausible value once). The results of these analyses were combined according to the formula by Rubin (1987). "Senwgt" is used as the weighting variable for the international model, which sums up to a total sample size of students of 500 for each country. This produces equal weighting of the countries in the international profile model. For the country specific profile models, "houwgt" is used, which sums up to the observed sample size of students for every country. Unless otherwise stated, all the analyses for this paper were generated using Mplus Version 7.11 (Muthén andMuthén 1998-2012) in combination with the full information maximum likelihood approach.

Latent profile models
Our first research question asks how many profiles describe the achievement patterns of students the EU-participants best. To investigate the data for the total sample representing all primary school students attending schools in Europe (n = 74,868) were separated into two to eight mixtures. Table 3 displays the fit criteria for the international latent profile model. As can be seen, the BIC and CAIC consistently decrease as the number of assumed mixtures increases. However, since the changes between the model with seven and eight profiles was very low, and because the proportion of students that could be assigned to the profiles with eight mixtures is very low for some conditional distributions, a latent profile model with seven mixtures is chosen.
The proportion of students that can be assigned to the seven profiles and the conditional means of the seven mixtures are shown in Table 4. The profile means on the achievement domains progressively increase from profile one to profile seven: thus, no cross-nested structure for the profiles can be observed. This suggests that the students can mainly be separated by different achievement levels on all domains simultaneously, rather than differentiation with respect to subject-specific strengths or weaknesses.  However, as shown later in the results, a more detailed look at the profiles reveals separable groups of students with relative subject-specific strengths and weaknesses. After applying the international profile model in each country separately, while holding the means constant, the proportion of misclassified students (classification error), the Entropie and the Pseudo-reliability can be estimated. The resulting values indicate an acceptable fit of the model in most countries (see Table 5). However, the assignment of the students from the sample of Romania and Malta is not as straightforward. Specifically, at least a quarter of the students in these countries could not be assigned reasonably well into the international profile model. Nevertheless, a transfer from the international profile model to the country specific distributions of students' achievement values is possible without further restrictions.

Multi-competence profiles of European 4th-graders
Our second research question aims at describing achievement of 4th-grade learners in Europe across subjects. Figure 1 shows the proportions of students in each of the multidimensional proficiency profiles. The profiles are color coded according to the subject specific strengths and weaknesses students show at the different levels of competencies (see Table 8). Codes of similar color shades (e.g. "reds" versus "blues") indicate similarity between the subject-specific strengths and weaknesses.
It can be seen that the profiles are invariantly rank ordered. Almost 60 % of all students are assigned to profiles 5, 6 and 7, which describe cross-subject performance above the subject-specific European averages (reading: 534; mathematics: 519; science: 521). For these profiles, relative strengths in reading and science become apparent. 5.1 % of all students are assigned to profile 7. These students show, with an average of 644 points for mathematics, 655 points for reading and 657 points for science, excellent results above the PIRLS/TIMSS Advanced International Benchmarks in all subjects. For profiles 3 and 4, which describe the cross-subject achievement patterns of 35.8 % of all students, no relative strengths can be observed. For profiles 1 and 2, which describe achievement below the PIRLS/TIMSS Low International Benchmarks, a relative strength in mathematics is observable. However, only 5.2 % of all students can be assigned to these profiles.

Relationship between student traits and the European multi-competence proficiency profiles
Our third research question aims at describing the relationship between student traits and the European multi-competence proficiency profiles. Tables 6 and 7 show the proportions of students assigned to the different profiles by socioeconomic and cultural characteristics and family language as well as indicators for attitudes towards learning and self-concept. Looking at the distribution of different profiles by socioeconomic and cultural characteristics and family language as well as indicators for attitudes towards learning and selfconcept, it can be seen that the modeled profiles clearly distinguishes between students with various traits. Students from families with relatively high socioeconomic and cultural capital are over represented in higher achieving profiles, students from homes with less educational resources or an immigrant background are overrepresented in lower achieving profiles. The profiles are also clearly distinguishable by attitude toward learning and self-concept.

Proficiency profiles of students in different European countries
Our fourth research question aims at a European comparison of proportions of high and low achieving students in each of the multidimensional proficiency profiles. For this, the European model was applied for all countries with fixed number of latent classes but variation of latent means. This provides information about the distribution of achievement Table 6 International achievement profiles of 4th-grade students in Europe by socioeconomic and cultural characteristics and family language a Parents have professional, academic, or high-level careers as opposed to manual labor work b At least one parent with a university degree or similar as opposed to no post-secondary education c More than 100 books at home as opposed to 100 or fewer d Language of test is spoken seldom or almost never at home as opposed to always or almost always It can be seen that, with the exception of Finland and Northern Ireland that have larger proportions of high achievers, in all countries about half of the students are assigned to profiles 4 and 5. The proportions of students in the profiles that describe the across subject performance of the high and low achievers vary substantially between countries: Finland (13.1 %), Northern Ireland (10.6 %), Hungary (9.2 %) and Ireland (7.2 %) show the largest percentages of students who show excellent results in all three subjects (profile 7), whereas in all other countries, less than 5 % of all students show comparable results. Taking a look at the lower achieving profiles, Malta (20.5 %), Romania (13.9 %), Poland (7.2 %), Hungary (5.9 %) and Spain (5.5 %) show the largest percentages of students in profile 1 and 2. In all other countries, the percentage of these very low performing students is below 5 %.

Frequencies of relative subject-specific achievement patterns
Our fifth research question aims identifying domain-specific strengths and weaknesses between countries. For this, the country-specific achievement profiles (n = 119) were coded according to the relative subject-specific achievement patterns (for country-specific coding see Table 10; Additional file 1: Appendix S1). Table 8 shows the frequencies of relative subject-specific achievement patterns across all European countries in total and for the highest (profile 6-7) and lowest (profile 1-3) achieving profiles.
As shown in Table 8, for about 91 % of all profiles across all European countries, a subject-specific strength or weakness for one or two subjects can be observed, when 10 point within profile achievement difference is considered substantial. 2 11 % of all profiles describe a relative strength in reading, 22 % a relative strength in mathematics and 23 % Table 7 International achievement profiles of 4th-grade students in Europe by attitude toward learning and self-concept a Categorized as the highest category "Likes Reading" of the international "Students like Reading" index b Categorized as the highest category "Likes Mathematics" of the international "Students like Learning Mathematics" index c Categorized as the highest category "Likes Science" of the international "Students like Learning Science" index d Operationalized as the top 50% median-split of the self-concept scale  % of students, who perform below average in all three domains. % of students, who perform at average in all three domains.
% of students, who perform above average in all three domains. % of students, who perform very good in all three domains. % of students, who perform excepƟonally well in all three domains. The countries are rank ordered by the magnitude of the difference of the cell volumes in Types 6 and 7.
In the highlighted cases the volumes of the cells are staƟsitcally different from the internaƟonal mean (p <.05). . It can also be found that, whereas only 18 % of all patterns found for profiles 6 and 7 show a relative strength for mathematics, about 43 % of the profiles 1-3 show a respective achievement pattern for the low achievers. For the high achievers, in 71 % of all profiles, a strength in science becomes apparent, but only for 31 % of the profiles describing low achievement. Furthermore, the profile showing a singular strength in mathematics is in 3 of 4 cases found among the profiles 1-3. Also, the profile showing a singular strength in reading is most commonly found among the profiles describing low achievers.

Similarities and differences in relative subject-specific achievement patterns between countries
Our sixth research question aims at identifying similarities and differences in relative subject-specific achievement patterns between countries. To describes and compare the within-country differences of achievement profiles in terms of the size of achievement differences between subjects as well as the discrimination of the profiles between competence levels and subjects different indicators were defined (see section on indicators for subject-specific strengths and weaknesses). In a second step, the distribution of subject-specific achievement patterns by country was analyzed and rated according to the within-country profile heterogeneity. Here, the within-country profile heterogeneity was considered "low" when at least five out of seven profiles followed the same pattern of subject-specific strengths and weaknesses, "rather low" if there is a more variety of codes but the found patterns show an equal hierarchy of subjects, and "high" when there is a variety of codes and rather diverse subject-specific strengths and weaknesses.
As shown in Table 9, countries differ according to their average heterogeneity of achievement across subjects and profiles (mean within profile heterogeneity). Hence, in some countries, subject specific strengths and weaknesses are more distinct than in others: Germany (11), Hungary (11) and Slovenia (11) show the lowest average differences among domains across all seven achievement profiles. Here, the achievement results    (20), subject-specific strengths and weaknesses can be observed much more frequently. The column headed "Average of conditional domain specific between profile variance" shows the overall distances between the lowest and highest profile across domains (shown in Table 10). The values between 299 and 463 indicate that in all countries, the profiles overall cover quite a big range of different achievement levels. Interestingly, in more than half of the countries, the largest difference between the mean of the highest profile and the mean of lowest profile can be found for mathematics whereas only for two countries (Malta and Germany), the largest between-profile variance can be found for reading.
Also the degree to which the domain-specific differences across profiles are constant across domains is shown in Table 9 (mean between-profile heterogeneity). Hungary (8), Lithuania (17) Ireland (18), Finland (21) and the Czech Republic (22) show the smallest differences among the European countries, whereas in Malta (83), Sweden (53), Romania (46) and Spain (41) the biggest differences can be found.
As shown in Table 10 the distributions of achievement patterns vary considerably between countries. Looking at the distributions within countries, seven countries exhibit low or rather low heterogeneity, whereas the ten other countries exhibit high. For Finland, Lithuania, Northern Ireland and Poland, the country-specific achievement profile across subjects follows the same pattern at all levels of competencies. Only profile 1 diverges, though it represents the achievement of 1.0-2.6 % of learners at the lowest end of the scale. Here, no differences between the subjects could be found. However, the profiles do differ between countries: In Finland, students at all achievement levels show relative strength in science followed by reading. In Lithuania, students at all levels are stronger in mathematics, whereas science is a clear weakness in Poland and in Northern Ireland in Hungary, for the high achievers (profile 7 and 6), a relative strength in science followed by reading is observed, whereas for all other profiles, students perform equally well in science and reading. For Ireland a relative strength in reading followed by mathematics. In Italy, a relative strength in reading followed by science is visible. In both the Czech Republic and Sweden a higher degree of heterogeneity a relative strength in science followed by reading can be found.
For countries with a high degree of within country profile heterogeneity, two groups can be distinguished: in Austria, Germany, Malta, Slovenia and Spain, low performers show a relative strength in mathematics whereas high achievers show a relative strength in reading and science. In the Slovak Republic and Romania, a relative strength in science is found for the high achievers whereas low achievers show a relative strength in reading.

Conclusions
In public debates, students are often described as learners with specific strengths or weaknesses in one specific learning area. However, several studies show that achievement results in mathematics, reading literacy and science are highly correlated. In this article we aim at exploring multidimensional proficiency patterns of fourth-grade students in Europe across achievement results in reading, mathematics and science. The joint assessment of PIRLS-2011 and TIMSS-2011 in a representative sample of grade-4 students provides a unique dataset to discover common structures across achievement domains and groups of learners with different achievement patterns.
To generate an achievement typology to describe patterns across the three domains, a latent profile analysis (LPA) was performed on the plausible values of the students overall and by country. All models show a reasonable fit and, in almost all countries, at least 75 % of the students can be reasonably described by the found achievement profiles. These profiles provide an informative approach to identifying and comparing proportions of learners at different levels of competences. Our findings on the proportions of low-and high-achieving students are consistent with the findings from Mullis (2013). An additional strength of the profiles is that they can be further explored according to their relationship(s) to different student traits. Here, we find that the profiles were highly distinguishable with regard to their background characteristics and consistent with the findings reported in the international reports. This supports the claim that the profiles allow a useful description of variation between countries on a singular outcome.
In addition, modeling by achievement profiles within countries allows to investigate patterns of achievement by subject-specific strengths and weaknesses. Based on the derivable patterns, a comparison between subject-specific achievement patterns across different proficiencies can reveal differences.
Concerning subject-specific strengths and weaknesses for both the European as well as in the country models, we find no cross-nested structure for the profiles. Hence, achievement across domains can be rather explained by a general level of achievement than subject-specific strengths or weaknesses of learners which suggest that the education systems of the analyzed member states of the European Union are doing a good job in providing balanced education for their students regardless the profile level. However, some subject-specific strengths and weaknesses can be identified but these are-with the exception of Malta and Northern Ireland-for most of the countries rather small. For the European model, it can be shown that those students who perform above average have a relative strength in reading and science, where low-performing students show a relative strength in mathematics. However, it can also be shown that this international pattern only represents the patterns of Austria, Germany, Malta, Slovenia and Spain rather well. For about half of the countries, a rather uniform pattern of subject-specific strengths and weaknesses can be found on all competence levels. For these countries, it can be stated that they are educating their all their students more successfully in one or two subjects than in the other(s). The subject itself varies between the countries. The country profiles were further distinguished according to their degree of within-and between-profile heterogeneity. Here, for countries with a relatively larger proportion of high achievers (Hungary, Finland, Czech Republic, Northern Ireland), the same pattern of strengths and weaknesses can be identified for all profiles and-with the expectation of Northern Ireland-the degree of the differences within profiles is rather small. This and the finding that profiles that describe a singular strength in mathematics or reading are overrepresented among the low achievers, might suggest that a successful education for all with a sufficient number of high-achievers requires a solid educational basis in all subjects. However, the results also support the finding that high achievers show relative strength in science and to a lesser extent in reading.
In order to study various within-and between-country-differences in proficiency patterns we classified a 10 point within-profile difference in achievement between domains as substantial. We choose this criterion because 10 points is exactly the average standard error of the (conditional) mean across profiles and achievement domain. Thus, only differences at or above one (average) standard error are considered as relevant in this paper. Also this criterion is somewhat arbitrary: differences less than 10 points should surely not be seen as relevant taken the sampling error (as expressed by the standard error) of the plausible values (and the profile solutions) into account. Therefore the 10 points can be seen as a lower threshold for comparing relative subject-specific strength and weakness. With this lower threshold, about 9 out of 10 profiles where considered to reveal a subject-specific strength or weakness. If a higher threshold is used (for example differences at or above two standard errors) then still 68 % of all profiles would have been considered to reveal a subject-specific strength or weakness. That is, the reported results in this paper will not change considerably even when a higher threshold is used to derive subject-specific strength or weakness. Nevertheless further research is needed to investigate the sensitivity of the herein derived test statistic with respect to different threshold criterions.
In addition the presented results on within-and between-countries differences with regard to subject-specific strength and weaknesses should not be over interpreted. However from an international comparative perspective it might be of interest to further understand why some countries show rather homogenous patterns of strengths and weaknesses for all students regardless of the proficiency levels, whereas in other countries the subject-specific strengths and weaknesses differ between low and high-achieving students.
What could be further research directions? Generally, it will be highly challenging to develop a coherent framework for deriving useful indictors that explains subject-specific strengths and weakness within profiles, between profiles and across countries. Among others, such indicators should be derived from input factors (e.g. instructional time) and process factors (e.g. cross-cultural differences in emphasis of subjects). The question arises whether research traditions from economics could be helpful here to. From a statistical point of view, it seems highly challenging to introduce the classes (schools) as a cluster factor in the mixture models, as they are usually only small samples within the classes (schools). Nevertheless, the correlation structure of large-scale assessment samples calls for such techniques. In addition, explaining the variance of profiles within a country and across schools could provide highly valuable information. Furthermore, the extensions of the latent mixture model to the subscales of PIRLS (reading purposes; comprehension processes) and TIMSS (content domains; cognitive demands) seem worth consideration. Finally, the interdependent relationship between the wellestablished benchmarks and the newly developed proficiency profiles could be further studied.