An online survey data in senior high school students and their parents in China during the outbreak of coronavirus disease 2019

The dataset presents the raw data collected through an online survey of senior high school students and their parents from 24 provinces, municipalities and autonomous regions (96 cities) of China. We conducted the online survey using electronic self-administered questionnaires designed as student-version and parent-version during 26th February and 4th March of 2020. The questionnaires were designed using the online survey tool Sojump (Shanghai Information Co.), and released through WeChat platform (Tencent Corp) following principals-head teachers-students/parents approach. All the students and the parents were asked to answer the questions voluntarily and anonymously after reading informed consent at the fore page of the questionnaires. The information collected from students included: 1) demographic characteristics, including sex, date of birth, name of high school, academic year, and self-evaluated performance level; 2) educational levels and occupations of parents; 3) degree preferences, including the willingness to learn medicine (prior and post COVID-19 outbreak), preferred medical career (clinician, public health practitioner, pharmacist, nurse or others), and main motivations for selecting or unselecting medical study; 4) infection of COVID-19 in acquaintances; 5) health literacy level on infectious diseases assessed using the Infectious Disease-specific Health Literacy Scale (IDSHL), and 6) anxiety level evaluated using the Chinese version of the Generalized Anxiety Disorder Screener (GAD-7). Information collected from parents included sex of their children and name of high school attended by their children, as well as their own educational level, occupation, anxiety symptoms, attitude toward their children's studying medicine, and main reasons for supportive or unsupportive attitudes, which were similar to the main motivations or de-motivations for medical study listed in the student-version questionnaire. Date and time for completion of the questionnaire were auto-recorded by the Sojump system. The dataset was established at the early stage of pandemic of COVID-19, and is valuable for understanding the instant psychological impacts of the outbreak of an emerging fatal infectious disease on senior high school students and their patents, and can provide evidence for policymakers on mental health intervention and medical education in China. The data are provided with this article.

or others), and main motivations for selecting or unselecting medical study; 4) infection of COVID-19 in acquaintances; 5) health literacy level on infectious diseases assessed using the Infectious Disease-specific Health Literacy Scale (IDSHL), and 6) anxiety level evaluated using the Chinese version of the Generalized Anxiety Disorder Screener (GAD-7). Information collected from parents included sex of their children and name of high school attended by their children, as well as their own educational level, occupation, anxiety symptoms, attitude toward their children's studying medicine, and main reasons for supportive or unsupportive attitudes, which were similar to the main motivations or de-motivations for medical study listed in the student-version questionnaire. Date and time for completion of the questionnaire were autorecorded by the Sojump system. The dataset was established at the early stage of pandemic of COVID-19, and is valuable for understanding the instant psychological impacts of the outbreak of an emerging fatal infectious disease on senior high school students and their patents, and can provide evidence for policymakers on mental health intervention and medical education in China. The data are provided with this article.
© 2022 The Author(s

Value of the Data
• The dataset is important to understand the instant psychological impacts of an emerging fatal infectious disease and related measures fighting against the disease on Chinese senior high school students and their patents.
• The dataset may help to develop a novel health literacy intervention to reduce anxiety in Chinese senior high school students during the epidemic of COVID-19, for it provides scores for each domain of the Infectious Disease-specific Health Literacy Scale. • The dataset may benefit medical educators, researchers and policymakers to improve current medical education systems as it offers a detailed account of senior high school students and their parents' perception and concerns on medical study that could be used to enhance attractiveness of medical career in outstanding senior high school students. • The dataset collected from students in key senior high schools may be used for a comparative analysis with those collected from general senior high schools. In addition, the data was collected at an early stage of pandemic of COVID-19, which may be used to evaluate the changes at later stages of epidemic.

Data Description
The outbreak of coronavirus disease 2019 (COVID-19) was first reported in December 2019 in Wuhan, China, and spread to all 34 provinces, municipalities and autonomous regions of the country very rapidly [ 2 , 3 ]. Extensive measures including travel restriction, social distancing, home confinement and coronavirus nucleic acid testing were taken to prevent the spread of the disease. People were asked to stay at home, and the opening of school was delayed indefinitely. The pandemic of COVID-19 affected people's life, leading to physically and mentally unhealthy [1] . Meanwhile, the efforts and achievements of Chinese medical staff in fighting against the fatal diseases gain them respect from the whole society, which may favour the choice of medical career in students [4] .
The survey was conducted at an early stage of COVID-19 pandemic around the world when the first wave in China was initially under controlled. The English version of electronic questionnaires were available in the Mendeley Data as supplementary files of "Questionnaire (student).pdf" for students and "Questionnaire (parent).pdf" for parents. The final dataset comprised a total of 42,557 participants, with 21,141 students, 21,024 parent guardians and 392 non-parent guardians. Raw datasets were provided in the Mendeley Data as "student dataset raw.xlsx" and "parent dataset raw.xlsx". The dataset of students was composed of 66 items which were defined as five types of variables including survey related information, socio-demographic and academic characteristics, preference to medical study, screen for general anxiety disorder, and assessment of infectious disease specific health literacy (IDSHL). The dataset of parents having 4 parts contains 38 items, for the IDSHL scale was not used in parents ( Table 1 ). The description Table 1 Type of variables in the student dataset and the parent dataset.    and assignment of each variable in the two datasets are clarified in Table 2 . Online questionnaires can only be submitted after all the questions are answered, so there's no missing value in the collected data theoretically. However, "PROVINCE" values for 76 observations (56 students and 20 parents) can't be inferred from the school locations and were missing. Data cleaning was conducted using SAS code available in the Mendeley Data as "Data cleaning.sas" and "SAS code for Data cleaning.txt". The origin question corresponding to each variable can be seen from the questionnaires attached to this article. Finally, the clean datasets were provided in the form of Excel named "student dataset cleaned.xls" and "parent dataset cleaned.xls" with rows representing observations and columns designating variables. All non-parent guardians were excluded from the analyses due to small sample size of the subjects and aiming to focus on parents only. Details about the datasets for 21,141 students and 21,024 parent guardians were described below.
Geographic distribution of participants is displayed in Fig. 1 . Province was determined by school locations. Regions with number of participants less than 200 were classified into "Others". Details of socio-demographic and academic characteristics are showed in Table 3 . Age of participants was calculated based on the year of survey and the year of birth. The students were more likely to be female. The number of students distributes equally from grade one to graduate year, all of which were larger than that of students at resit of graduate year. Most guardian participants were women, married, non-medical professionals, parents, and had only one child. Table 4 shows the preference to medical study before and after the COVID-19 outbreak. Questionnaires of students and parents are similar in this part, including children's performance, students' intention of studying medicine, expected medical majors, career decision makings. Only participant choose "Yes" in question "Now do you plan to apply for a medical school?" can answer question "If you plan to study medicine, what is the main reason for the plan". When participant choose "No", he/she should answer "If you have no plan to study medicine, what are your major concerns?".
The anxiety symptoms were assessed in students and parents respectively using the Chinese version of the Generalized Anxiety Disorder Screener (GAD-7) [ 5 , 6 ]. The score of the GAD-7 was used to classify anxiety levels, with score of 0 to 4 as minimal, 5 to 9 as mild, 10 to 14 as moderate, and 15 to 21 as severe anxiety. Table 5 presents distribution of responses in relation to GAD-7 scale in students and parents by sex.
The IDSHL of students were measured using the scale developed by Tian et al [7] . The scale includes 4 subscales which are designed to measure basic knowledge, prevention, management or treatment of infectious diseases, and identification of pathogens and infection source. Supplementary Table 1 shows the correct answer to each question and the scoring of the IDSHL, while Table 6 presents distribution of responses in relation to IDSHL in student participants. The total score of IDSHL ranges from 0 to 100, with a higher score representing a higher level of health literacy. Most students reported with good or very good physical condition, didn't get illness or injury during past 2 weeks, spent more than 3 hours online. And the main approach for them to acquire health related information was internet .

Experimental Design, Materials and Methods
This cross-sectional study was conducted among senior high school students and their parents from 24 provinces, municipalities and autonomous regions (96 cities) of China between 26th February and 4th March in 2020 in China. The online questionnaires were designed as a student-version and a parent-version in the Sojump platform (Shanghai Information Co) and were released through WeChat platform (Tencent Corp) through convenience sampling approach. Specifically, the online questionnaires were forwarded to the principals of several key senior high schools by the Admissions Office of the Shanghai Medical College of Fudan University, and then to the head teachers of classes by the principals. Finally, the head teachers released the students' and parents' version of questionnaires to the corresponding WeChat groups consisting of students or guardian (mainly parents) only. In the parents' group, only one parent was included for each student. In rare cases when neither of the parents was available, one grandparent or other family members would be included as the guardian of the student. All the students and the parents were asked to answer the questions voluntarily and anonymously after reading informed consent at the fore page of the questionnaires.
Duplicate questionnaires were removed based on IP address to ensure only one questionnaire was completed by one person. The principals and the head teachers were also encouraged to forward the online questionnaires to their colleagues in other key senior high schools.

Ethics Statements
This study was approved by the Institutional Review Board of the Fudan University School of Public Health (IRB0 0 0 02408 & FWA0 0 0 02399). Informed consent was presented at the fore page of questionnaire. Once clicking the "start" button, the participant was assumed to have read the information about the survey, and voluntarily agree to participate in the study. Considering the nature of an anonymous online survey, and that senior high school students in China are usually at age of 16 or above, we did not seek for written consent form from legal guardians of student participants. However, the legal guardians (mainly parents) of all student subjects were informed of the survey in corresponding parent WeChat groups by head teachers and agreed their children's participation of the survey. The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional com-    mittees on human experimentation, the Helsinki Declaration of 1975, as revised in 20 0 0, and the platform(s)' data redistribution policies.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
Effect of COVID-19 outbreak on generalized anxiety disorder and medical career decision of senior high school students in China (Original data) (Mendeley Data).