Measuring students' generic skills through national assessment

Article

We will start with the guidance flow for students in PBIM. This guidance offers a wider collaboration between teachers and students. Referring to Baneres and Conesa, there are five important indicators called capstone elements in the process of integrating theory and practice, namely teamwork, problem solving, decision making, critical thinking, communication. Teamwork is one of the soft skills that is currently developing and becoming a concern in the world of work and is increasingly global (Baneres & Conesa, 2017).
One of the tools for assessing PBIM is the e-Portfolio. Research that has been conducted by (Bezanilla et al., 2019;Karami et al., 2019;Lai et al., 2017) on the tools used by teachers to measure GS (problem solving, critical thinking, creative thinking, oral and written communication, social interactions, ethical decisions, and global perspectives) is e-portfolios and selfsurveys. The introduction should perform the relationship among the research background, rationale, justification of the research urgency, the emergence of research problems, alternative solutions, the solutions which are chosen, and the research aims. The background and rationale should be stated according to the theories, evidence, pre-survey and/or relevant research. It may also contain the narrative operational definition of the main constructs, variables, or terminologies used.
Constructivism is the underlying principle of GS in Indonesian madrasah, encourages social and communication skills by fostering a climate in the classroom. This places a focus on team-work and the sharing of ideas (Derry, 1996). By participating in group assignments, students must develop their ability to communicate their ideas clearly as well as work well in teams. In this study, three social literacy indicators in Indonesian madrasah based on GS level are referred to as the GS indicator that will be used. The intended GS level is basic, advanced, and needs creative space. These three levels refer to the GS Indicators that will be measured, namely critical and creative thinking (C. K. Y. Chan, Fong, et al., 2017), as well as creative (C. K. Y. Chan & Fong 2018;C. K. Y. Chan, Zhao, et al. 2017), social skills (C. K. Y. Chan, Fong, et al. 2017), interpersonal skills (C. K. Y. Chan, Zhao, et al. 2017).
These three indicators look for accurate instruments, one of which is a self-survey. Currently, the most common method for evaluating the impact of undergraduate education on generic skills development is the self-survey. Survey items focused on students' perceptions of their progress in decision making, problem solving, analytical skills, collaboration, communication, ethical development, and also vocational preparation (Ginns et al., 2007;Zhao & Kuh, 2004;Webster et al., 2009). Therefore, the researchers prepared this self-survey packed as a nontest instrument in the form of multiple choice with a Likert scale that would be answered by students.
This study aims to measure the GS level of MA students. In addition, this research can also be used to evaluate learning outcomes in madrasah aliyah. The GS measurement instrument can be used as a reference for measuring two other indicators of the five socio-cultural literacy indicators. This is important, because several things, including madrasahs that are managed under the Ministry of Religious Affairs and more than 90% managed by the private sector often have many shortcomings in terms of funds, quality of teachers and adequate facilities and infrastructure (Umar et al., 2022).

RESEARCH METHOD
This study used a self-survey method referring to Creswell and Clark (2018) with the focus of this study on measuring the GS achievement of Madrasah Aliyah students in Indonesia. The subjects of this study were students of Madrasah Aliyah throughout East Java and Central Java who represented Indonesia. This is based on EMIS data from the Ministry of Religion which recorded as many as 91 State Madrasah Aliyahs and 1,752 Private Madrasah Aliyahs for the East Java region as well as 65 State Madrasah Aliyahs and 622 Private Madrasah Aliyahs for the Central Java region. Data were collected using a questionnaire that was distributed through a Google Form, by which the link was sent to teachers in each madrasah, subject teacher groups in East Java and Central Java to obtain data which were then analyzed using a quantitative approach using SPSS version 26.
Interpersonal Skills Instruments developed are validated by credible experts (Turrado-Sevilla & Cantón-Mayo, 2022) consisting of experts in the field of study, teachers, and evaluation experts. Suggestions from experts are considered to complete the contents of the instrument related to relevance, scope and sequence of the instrument. Measurement of instrument validity using Aiken analysis involving six raters. Each item of each dimension is assigned a range value of 1-5 indicating the degree of non-conformity until it is very appropriate. The results of the expert assessment show that the Aiken's V is above 0.79 which means that the instrument is valid (Aiken, 1985).
The data collected include (1) gender, madrasah name, madrasah status (2) students' GS, which includes religious moderation, CTPS, and interpersonal skills. The questions are openended, single, with a rating scale of 1 to 4). The statistical analysis used is descriptive qualitative. Qualitative variable numbers are presented with numbers (n) and percentages (%), and quantitative variables with (m) and standard deviation (SD) answers with a Likert scale are analyzed separately according to the content analysis guide (Sugiyono, 2018). The data obtained through a questionnaire in the form of a google form will be analyzed starting from descriptive, validity, and reliability test with the help of SPPS version 16.

Instrument Test
The questionnaire that will be used as a data collection tool is first tested for validity and reliability. This test is intended to measure the feasibility level of the questionnaire as a data collection tool. The results of the validity and reliability of the research questionnaire can be explained as follows.

Instrument Validity Test
The calculation is done by correlating each item score with the total score using correlation analysis. The test criterion is if the correlation coefficient value is greater than r table = 0.2759, then it shows that the indicator is valid for measuring the construct in question and is declared valid as a data collection tool. The results of the validity test as the results can be seen in Table 2.
Based on the results of testing the validity of the instrument, it was found that all indicators in Table 2 produced a correlation coefficient value greater than r table = 0.2759. Thus, it can be concluded that all indicators in Table 2 are valid and can be used as a data collection tool in this study. The results of calculating the validity of the contents of the GS instrument using the Aiken formula from the assessment of six experts in the field of Islamic religious education in each aspect can be seen from Table 3. The calculation results show an overall average above 0.79, which means that all instrument items can be said to be valid (Yuliarto, 2021).

Instrument Reliability Test
Instrument reliability test was used with the aim of knowing the consistency of the instrument as a measuring instrument, so that a measurement can be trusted, to test used Cronbach Alpha, in which an instrument will be more reliable if the Alpha coefficient is more than 0.6 (Purnomo, 2016). The summary of the results of the questionnaire reliability test on all valid items according to the SPSS output can be seen in Table 4. Based on Table 4, it is known that the Cronbach Alpha value for all variables in this study resulted in a Cronbach Alpha value of more than 0.600, so that all questions in this research variable were stated to be consistent, reliable, and suitable to be used as a data collection tool.

Descriptive Analysis
The descriptive analysis explains the description of each research variable which includes the minimum, maximum, median, average, and standard deviation values, as well as the frequency distribution of the categorization results. The results of the descriptive analysis can be explained as in Table 5. Based on the data presented in Table 5, out of a total of 51 respondents it is known that the lowest religious moderation is 1.8 and the highest is 4.00. The average value of the respondents' religious moderation was 3.27 and the median was 3.2 with a standard deviation of 0.59. The standard deviation value, which is smaller than the average indicates that the diversity of religious moderation values between respondents tends to be small.
Then from Table 5, from a total of 51 respondents it is known that the lowest CTPS is 2.27 and the highest is 4.00. The average CTPS value of the respondents was 3.22, and the median was 3.00 with a standard deviation of 0.520. The standard deviation value which is smaller than the average indicates that the variance of CTPS scores between respondents tends to be small. Furthermore, from a total of 51 respondents, it is known that the lowest interpersonal intelligence is 1.00 and the highest is 4.00. The average value of the respondents' interpersonal intelligence was 3.18 and the median was 3.00 with a standard deviation of 0.71. The standard deviation value which is smaller than the average indicates that the diversity of Interpersonal Intelligence scores between respondents tends to be small.

Respondent's Perception
Categorization of assessment based on the score of respondents' responses, where the assessment category is determined based on the number of measurement scales used, which are four classifications. Based on the results of the calculation of the class length for each interval, Table 6 presents the classification of the assessment categories for the arithmetic mean value. Based on Table 6, the scale can be used as a reference to provide an assessment of the results of the existing questions, which are related to the existing variables and discussed in this study. The following is a description of respondents' perceptions of each variable, in full as presented in Table 7. Religious moderation is the first indicator of GS, the highest average result is 54.03% on a Likert scale 4. This means that respondents have a good attitude of religious moderation, 36.38% are in the very good category, 8.93 are in the sufficient category and the remaining 0.65 are in the less category. These results show that, in terms of religious moderation, there is an increase in attitudes of religious moderation when compared to 2017 where there was a lot of violent behavior due to lack of understanding of religious teachings .
Religious moderation is an important part of Indonesian country, Pancasila (five pillars of the nation) and the 1945 law which is the basis of national education shows that the government instills good education. moderate by making Pancasila the basis of education. The government has also taken strategic steps in an effort to realize an attitude of religious moderation, as in attachment 1 of Presidential Regulation No. 18 of 2020 concerning the National Medium-Term Development Plan for 2020-2024 is one of the human resources development strategies, especially in character building.
This effort must continue in a more holistic and integrative manner, especially in the education system by emphasizing the values of integrity, work ethic, mutual aid, and ethics. These ethics include ethics in learning and social systems. Religious education must be instilled with the cultivation of the noble values of the nation's culture in family institutions and interactions between citizens. This needs to be done to strengthen harmony, improve the culture of literacy, and innovation. Furthermore, it will give rise to creativity for the realization of a knowledgeable, innovative, creative, and characterful society. This is also corroborated by the results of Muzaqi's research, which states that students increasingly have a good attitude of religious moderation after being integrated in learning (Muzaqi et al., 2022). From Table 8, the respondent's response to the CTPS variable obtained a value of 67.99% on a Likert scale of 4, which indicates that the respondent's critical thinking skills and creativity are in the good category, according to the results of the data shown, 5.94% of their students fell into the sufficient category, while 27.52% of respondents fell into the very good category. The indicator that was rated the highest by the respondents was the question "In my opinion, every answer must have a basis" with an average of 3.43. And the lowest indicator assessed by respondents was the question "I associate one thing with something to solve a difficulty" with an average of 3.00.
To be master of CTPS, Rustam mentioned that it will form critical reasoning, be able to make decisions, be creative, be able to draw logical conclusions, which are needed by students in mastering knowledge related to students' real world life (Rustam & Priyanto, 2022). This was confirmed by Greiff who stated that CPTS is a soft skill needed (Greiff et al., 2013). It is also corroborated by other studies that Yunfeng conducted that stated similar results (He et al., 2018), and in some publications CPTS is the most needed soft skill (Klegeris, 2021). According to Klegeris, the ability of CTPS is influenced by interactional techniques used by teachers in learning (Klegeris et al., 2017). In consensus, it says that lecture-based teaching is not effective in the development of CTPS (Newton et al., 2015). Thus, in the context of strengthening students' abilities in CTPS, the selection of models in learning must be considered by teachers in order to improve students' critical thinking skills, such as the use of the Jigsaw model and discovery learning (Usman et al., 2022).
Based on the results of the data in Table 9, it is known that the respondent's response to the interpersonal intelligence variable obtained a result of 55.88% on a Likert scale of 4, that is, the respondent had interpersonal intelligence in the good category. 31.96% of respondents were in the very good category, and 10.99% of their students were in the sufficient category, and 1.18% were in the less category. The indicator that was rated the highest by the respondents was the question "I always cooperate with friends in organizational activities and other activities at school" with an average of 3.35. And the lowest indicator assessed by respondents was the question "I can easily remember other people's faces even though I only met once" with an average of 3.00. Interpersonal Intelligence is one of the intelligences that determines student success, as in Eva's research it was stated that, interpersonal intelligence has a significant influence on learning (Istapra et al., 2021). In another study, it was stated that students who have high interpersonal intelligence will be able to establish effective communication with other students, have high empathy, and be able to work in groups (Abas et al., 2019). The results of this study further strengthen the results of this study where indicators of collaborating with friends in organizational activities and other activities are in the good category at the interval of the Likert scale 1-4.

CONCLUSION
GS measurements in Madrasah Aliyah students in Indonesia showed low results, especially in the aspects of religious moderation and social interpersonal. The two aspects are in the range of 50-56% of the Likert scale 4. For this reason, policymakers should immediately create a program to improve GS in MA. Based on the results of this study, it can be concluded that the GS of students in Madrasah Aliyah still needs to be improved by providing some training that can support the improvement of students' GS. The Ministry of Religion needs to hold short courses for teachers in MA who have a role in guiding and equipping students with some skills in the 4.0 era. Moreover, the student center needs to be emphasized.