Antecedents of adolescent students’ ICT self-efficacy: The ICT dataset

Based on the Programme for International Student Assessment (PISA) 2015 dataset, the information and communication technology (ICT) dataset focuses specifically on ICT-related constructs in the context of educational technology. It includes a wide range of student-level variables collected from 30 Economic Co-operation and Development (OECD) countries, which pertain to students’ motivational and behavioural characteristics in relation to their ICT self-efficacy. In total, it comprises 201, 652 students from 7708 schools. As technology has become an integral component of education, the ICT dataset can serve as a handy resource for studying ICT-related constructs. Besides, the ICT dataset holds advantages over the original PISA dataset for its intensive focus and easy readability. With this important resource, researchers can undertake their own research in the neighbouring fields of ICT, developing their own theories or validating existing theoretical frameworks and statements. The focus of this study is to identify the antecedents of adolescent students’ ICT self-efficacy and illuminate potential mechanisms at work.


Specifications
1 Figure ( Fig. 1 ) CSV files 3 R codes Supplementary Materials (Correlation Matrix) How data were acquired Data were acquired from the Programme for International Student Assessment (PISA) 2015 dataset (URL: http://www.oecd.org/pisa/data/2015database/). Based on a series of student, teacher, principal and curriculum questionnaires [3] , only data aiming to identify ICT-related predictors of students' performance were acquired from the Student Questionnaire and the ICT Familiarity Questionnaire (URL: http://www.oecd.org/pisa/data/2015database/), which were administered in 30 Organization for Economic Co-operation and Development (OECD) countries and regions around the world. Data format Raw; filtered Parameters for data collection Aiming to investigate the relationship between adolescents' interest in the ICT and their ICT self-efficacy, the dataset was collected using eight variables related to ICT interest and ICT self-efficacy in the PISA 2015, which made the dataset unique with an added value in this study. These eight variables included that (1) The independent variable, namely, students' interest in ICT (coded as INTICT), was developed for the first time in the PISA 2015; (2) The dependent variable, namely, students' self-perceived competence in using ICT (coded as COMPICT), was also developed for the first time in the PISA 2015; (3) The four mediating variables were categorized as behavioral factors, namely, "use of ICT at school in general" (USESCH), "ICT use outside of school for schoolwork" (HOMESCH), "ICT use outside of school for leisure (ENTUSE)", and "students' ICT as a topic in social interaction" (SOIAICT); (4) The two control variables were student gender and the index of economic, social, and cultural status (ESCS).

Description of data collection
The primary data were drawn from the official OECD website (URL: http://www.oecd.org/pisa/data/2015database/) with a series of questionnaires (URL: http://www.oecd.org/pisa/data/2015database/ ). In this study, the data were collected from the Student Questionnaire and the ICT Familiarity Questionnaire data files with eight variables, which included student-level responses to a wide range of background variables and outcome measures [3] . The raw data were provided in the supplementary materials. In particular, INTICT was derived based on six items that measured students' overall enjoyment of ICT; COMPICT was derived from five items that measured how competent students perceived themselves to be in using ICT-related knowledge or skills; USESCH, HOMESCH, ENTUSE and SOIAICT were derived from a total of 9, 12, 13, and 5 items, respectively, that emphasized the extent to which students were physically involved in ICT-related activities.

Value of the Data
• The ICT dataset distinguishes itself though a narrow but intensive focus on ICT-related constructs, bearing relevance to today's educational reality. • The dataset could facilitate an understanding of the complex relationships between ICTrelated motivational and behavioural factors. • The ICT dataset holds advantages over the original PISA dataset for its easy accessibility and simple structure.

Data Description
To investigate the relationship between adolescents' interest in information and communication technology (ICT) and their ICT self-efficacy, this dataset was compiled using a number of ICT-related variables from the Programme for International Student Assessment (PISA) 2015 dataset (URL: http://www.oecd.org/pisa/data/2015database/) using the programming language R (URL: https://www.R-project.org), resulting in a sample of 201,652 students from 7708 schools from 30 Economic Co-operation and Development (OECD) countries. During data pre-processing, missing data were filtered with the expectation-maximization (EM) algorithm [1] using the statistical package SPSS 20. A 1-1-1 multilevel mediation model was adopted for data analysis, which was performed using the lavaan package [5] in R. The independent variable was adolescent students' ICT interest. The dependent variable was ICT self-efficacy. The mediators included students' ICT use at school, outside of school for homework and leisure, and ICT use for social interaction. The control variables included student gender and socioeconomic status. Given the complexity and gigantic volume of the original dataset, this accessible dataset would serve as a unique and much-needed replacement for researchers who are interested in ICT-related constructs. All the following tables are from the related research article [4] . Information relating to the sample and the descriptive statistics of the main variables was provided in Table 1 for each individual country.
The questionnaires from which the variables were derived can be accessed at the website of the OECD (URL: http://www.oecd.org/pisa/data/2015database/), and the technical details of the scale construction and validation procedures can be found in the official 2015 PISA technical report [3] .
The eight variables that were used for each OECD country are listed as follows: The raw data include information about 201, 652 students from 7708 schools. A total of eight variables were included in the data, including ICT interest, ICT self-efficacy, ICT use at school, ICT   Note . The intraclass correlation coefficient is calculated as the proportion of total variance that is accounted for by the clustering of students in schools. use at home for schoolwork, ICT use for leisure, ICT use for social interaction, student gender, and socioeconomic status. The supplementary materials include the R code for analysis, the correlation matrix for each country, and the raw and imputed data. There are three R scripts in the supplementary materials, one for the calculation of the intraclass correlation coefficient (ICC), one for the calculation of the regression coefficients of the control variables, including student gender and socioeconomic status (ESCS), and one for multi-level mediation analysis. These three scripts are coded as "R code for ICC", "R code for the control variables", and "R code for multilevel mediation", respectively, all of which can be found in the supplementary materials. The correlation matrix includes the bivariate correlations among all the variables in the imputed dataset, which are  Note . B: unstandardized model coefficient. 95% CI = 95% bias-corrected confidence intervals based on the bootstrapping method. 95% confidence intervals that do not contain zero indicate statistically significant results. Bootstrapping is based on 10 0 0 samples. calculated individually simultaneously for each OECD country. The raw data are a subset of the PISA 2015 student dataset, which contains only the ICT-related constructs, whereas the imputed data have been imputed using the expectation-maximization algorithm.

Experimental Design, Materials and Methods
Based on the PISA 2015 dataset, 30 OECD countries were selected for analysis, resulting in a sample of 201, 652 students from 7708 schools. During data pre-processing, missing data were imputed with the expectation-maximization (EM) algorithm [1] using the statistical package SPSS 20. A 1-1-1 multilevel mediation model was adopted for data analysis, which was performed using the lavaan package [5] in R (R Core Team, 2019).
To obtain the original data, the official website of the OECD (URL: http://www.oecd.org/pisa/ data/2015database/ ) was accessed, where all the datasets pertaining to the Programme for International Student Assessment (PISA) 2015 were hosted. For the purposes of this study, only the student questionnaire data file (coded as: PUF_SPSS_COMBINED_CMB_STU_QQQ.zip) was downloaded for use. As this dataset contains a large number of variables, only the ICT-related variables used in this study were retained, and the original SPSS format was kept unchanged. After these variables were selected, they were inspected using the "Analyze > descriptive Statistics > Descriptives" function in SPSS to identify the percentage and patterns of missing data. Then expectation-maximization was performed on these original variables to impute missing values using the "EM" function contained in "Missing Value Analysis". These pre-processing steps resulted in the final data used in our original study, which contains variables as mentioned below.
In accordance with the research questions, two variables were used as control variables (student gender, socioeconomic status), one was used as the independent variable (adolescents' ICT interest), four were selected as the mediating variables (ICT at home, ICT use at school, ICT use for leisure, ICT use for social interaction), and one as the outcome variable (adolescents' ICT self-efficacy at age fifteen). Given that educational data are inherently hierarchical, this structure needs to be taken into account as well. Therefore, a 1-1-1 multilevel mediation model was constructed to investigate the research questions. This analysis starts with a calculation of the intraclass correlation coefficient (ICC), which was used to gauge the magnitude of the clustering effect caused by the data structure. This was performed using the lmer package [2] in R. Then, the main analysis was performed using the lavaan package [5] , which can be used to account for the hierarchical structure of the data along with the mediation analysis. During this process, the regression coefficients for the control and main variables were returned, along with their confidence intervals and standard errors. Table 7 Relative indirect effects of ICT interest on ICT self-efficacy and the proportion mediated. Note . 95% CI = 95% bias-corrected confidence intervals based on the bootstrapping method. Confidence intervals that contain zero are deemed nonsignificant and highlighted in bold.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.