Data Science and Machine Learning Teaching Practices with Focus on Vocational Education and Training

. With the development of technology allowing for a rapid expansion of data science and machine learning in our everyday lives, a significant gap is forming in the global job market where the demand for qualified workers in these fields cannot be properly satisfied. This worrying trend calls for an immediate action in education, where these skills must be taught to students at all levels in an e ffi cient and up-to-date manner. This paper gives an overview of the current state of data science and machine learning education globally and both at the high school and university levels, while outlining some illustrative and positive examples. Special focus is given to vocational education and training (VET), where the teaching of these skills is at its very beginning. Also presented and analysed are survey results concerning VET students in Slovenia, Serbia, and North Macedonia, and their knowledge, interests, and prerequisites regarding data science and machine learning. These results confirm the need for development of e ffi cient and accessible curricula and courses on these subjects in vocational schools.


Introduction
Data is everywhere around us and a part of our daily lives in more ways than most of us even realize.The total amount of digital data that we generate every day is growing exponentially.According to estimates, by the end of 2021 there was around 74 zettabytes of generated data in the world (Holst, 2021), which is expected to double by the end of 2024.A large part of this is due to the increased usage of Internet of Things devices.
This huge amount of data can be exploited by many industries and open new business opportunities.Indeed, according to the final study of the European Data Market tool, the value of the Data Economy exceeded the threshold of 400 billion EUR in 2019 for the EU27 plus the UK, with a growth of 7.6% over the previous year.This was complemented by growth of the Data Market value in 2019 above the one exhibited by the total IT spending, at 4.9% year-on-year, reaching 75 billion EUR (Cattaneo et al., 2020).More specifically, in the Baseline scenario, the Data Market is forecast to reach 82.5 billion EUR in the period 2020-2025.The Data Economy will grow faster than the Data Market, reaching a value of 550 billion EUR by 2025.Moreover, in the High Growth scenario the Data Market is forecast to reach 107 billion EUR, and the Data Economy will reach a value of 827 billion EUR by 2025 (Cattaneo et al., 2020).In summary, the Data Market and Data Economy are projected to be in a continuous growth in the years to come, as technology investments and new digital and data platforms help shape the world and adapt it to the new post-pandemic realities.
According to the same study, the number of data professionals reached 76 million in 2019, corresponding to 3.6% of the total workforce -an increase of 5.5% over the previous year.There is also an ongoing imbalance between the demand and the supply of data skills in Europe with a gap of approximately 459,000 unfilled positions corresponding to 5.7% of total demand.The data skills gap is forecast to continue in all scenarios as demand will continue to outpace supply (Cattaneo et al., 2020).This is aggravated by the lack of analytical skills reported as a key challenge by 50% of employers (Kolding et al., 2021).Hence, there is an obvious need of professionals who will understand the basis of data science and will be able to successfully work in that field.
Different sources give different definitions of data science.For example, in Dhar (2013) data science is defined as "a study of the generalizable extraction of knowl edge from data", which correctly addresses the heart of data science, but in a nar row sense.Stanton (2012) defines it as "an emerging area of work concerned with the collection, preparation, analysis, visualization, management, and preservation of large collections of information".Provost and Fawcett (2013) state that "Data science involves principles, processes, and techniques for understanding phenom ena via the (automated) analysis of data."A synthesis of these definitions would imply that data science is a very broad field concerned with the collection, pro cessing, preservation, and analysis of data, and with the subsequent knowledge inference from it.
There are three important pillars of data science: data, technologies, and people (Song and Zhu, 2016).

Data
• refers to domain areas, such as relational data and non-relational data; Technologies • includes Hadoop eco systems, NoSQL, in-memory computing, data mining, machine learning, and cloud computing; People • include computer scientists, statisticians, domain experts, data scien tists, and business analyzers.
Among the three pillars, it can be argued that people represent the most impor tant one.More computers, storages, and tools can be bought in order to effectively process big data, but human ability does not scale.Educating people as data sci entists is key to addressing the challenges of the big data era.At the student level, teaching data science can provide access to data that intersect with students' lives.Not every student, particularly at high school level, has an interest in becoming a scientist.However, since today's students engage with data in their daily lives, mostly through social media, educators would be missing an important oppor tunity if they don't prepare students to work with data.Additionally, artificial intelligence itself can be a valuable tool in teaching and designing educational methods (Holstein et al., 2019).So far, some definitions of the necessary set of skills and competencies for data science have already been proposed (Manieri et al., 2015), (Mikroyannidis et al., 2017), and they are important for the design of appropriate curricula.Equally important are the expected learning outcomes from given curricula, and the European Qualifications Framework for Lifelong Learn ing recommends designing the learning outcomes in terms of knowledge (body of related theories and practices), skills (apply knowledge to complete tasks), and competence (ability to use knowledge and skills in work or studies and in profes sional development) (Larsen et al., 2008).
With the recent technology trends, we as a digital society are generating vast amounts of data.This requires adaptation of the current database and storage technologies and development of new ones (e.g., distributed databases, cloud stor age).Additionally, the data processing techniques and algorithms are changing so that they more efficiently tackle large amounts of data, and sometimes even stream data.This also influences the machine learning algorithms and software toolkits which are specifically designed for big data processing, such as Amazon Sagemaker or Azure, among others.Big data presents new opportunities for machine learn ing, e.g.enabling learning patterns in the data from multiple views in parallel (Zhou et al., 2017), but also introduces challenges such as high data dimension ality, model scalability, distributed computing, streaming data (Najafabadi et al., 2015), adaptability, and usability.This paper will examine the current state of machine learning and data sci ence education at university, high school, and especially at vocational education training levels, with several illustrative examples outlined.Vocational education and training (VET) is the training in skills and teaching of knowledge related to a specific trade, occupation or vocation in which the student or employee wishes to participate, as defined by the European Commission 1 .VET is of special inter est here since we argue that the teaching of data skills at this level is still at its infancy but holds a lot of potential for efficient educa-tion of young people and opening new career opportunities for them in a track parallel to the conventional high school-university route.
The paper will also give an overview of an Erasmus+ project called VALENCE (Advancing machine learning in vocational education); as part of the project and the development of a platform for vocational study courses for data processing, data visualization, and machine learning, a survey was deployed to the students of three VET schools in the Balkans region, in order to assess their awareness of and expertise with data science and machine learning.The analysis of the survey results will also be given.Some of these results have already been published in (Zlatinov et al., 2021), but this paper is a significant extension which includes analysis of more questions from the survey, as well as a more in depth evaluation of the students' answers.In addition, another difference and addition to the value is the aforementioned examination of the current state of machine learning and data science education at different educational levels.This will show that there are both suitable conditions and an urging need to implement data science and/or machine learning curricula at the VET level.
The paper's structure is as follows: Section 2 will describe the general state of data science and machine learning education across different educational levels globally, before presenting several examples particularly in vocational education and training.Next, Section 3 will describe the VALENCE project, which this re search has stemmed from, and will describe the survey conducted among VET students in the Balkans region regarding their perception of and experiences with machine learning and data science.The survey results will be analysed and dis cussed in detail before giving a summarization.Finally, a concluding Section 4 will give an emphasis on the most important issues going forward.
The standard abbreviations "DS", "ML", and "AI" will be occasionally used throughout the paper for the terms "data science", "machine learning", and "ar tificial intelligence", respectively.

General State of DS & ML Education across Educational Levels
Globally, it can be argued that the first promising signs of standardizing the concept of teaching artificial intelligence, machine learning, and data science in el ementary and high schools are already present.In Touretzky et al. (2019c,b,a), the authors have defined that teaching AI concepts in K-12, an American expression indicating the range of years from kindergarten to the 12th grade which generally would cover primary and secondary education in most world countries, must cover five main ideas: Perception.

•
Representation and reasoning.

•
Natural interaction.• Societal impact.• Through their AI4K12 initiative, the authors further argue that ML concepts that must be covered in K-12 education should include: What is learning?• Approaches (algorithms) to machine learning.

•
Types of learning algorithms by learning style.

•
Fundamentals of neural networks.

•
Types of neural network architecture.

•
How training data influences learning.

•
Limitations of machine learning.• It is interesting to then examine some global examples for instructional units (classes, courses, workshops, assignments) covering machine learning in particular, while also bearing in mind the concepts and topics listed above.In Marques et al. (2020), the authors have undertaken an ambitious attempt to map and study the state-of-the-art on teaching machine learning concepts in elementary to high school education.
Their comprehensive analysis shows interesting results regarding around 30 relevant instructional units (IUs).It examines: The rising trend of the number IUs covering machine learning per year, essen tially • skyrocketing since 2018.
The machine learning approaches/areas covered, where supervised learning is • much more popular than unsupervised or reinforcement learning.
The application domains for the ML algorithms, where computer vision is a domi-• nant area, likely because of the possibilities of easy visualization of data and results.
The ML processes covered, where data management, model learning, and model • evaluation are expectedly dominant over issues such as feature engineering, model testing, or algorithm deployment.
The tools and frameworks used, where Jupyter Notebook and TensorFlow are • dominant.
The organizational and pedagogical aspect of the IUs, which show that most IUs • are in the form of courses and workshops, utilizing mostly lectures, discussions, hands-on activities, and projects, and are mostly targeted at middle school and high school students (not so much at elementary schools).
The general conclusion is that teaching ML in K-12 education is in an up wards trend.The analyzed IUs teach different and varying things, from broad ML concepts to specific ML techniques.Artificial neural networks seem to be one of the most commonly taught approaches, easily explained by their real-world use and popularity.Furthermore, many of the IUs teach only the basics of machine learning and data management in general, keeping more advanced concepts and underlying algorithms and processes opaque and black-boxed.It is also positive to note that many of the analyzed IUs provide instruction materials for free and use both standard and customized frameworks.However, many of the analysed courses are extracurricular and there seems to be a general lack of teacher train ing and unifying standards.It is also clear that further research in this area is necessary.
As stated previously, the increased use of processing large amounts of data in practice, in order to find certain regularities or draw certain conclusions, inevitably imposed a situation in which data science has become an indispensable tool in many areas of everyday life.The world's most prestigious universities, which by translating theoretical concepts into their practical use are often the drivers of the introduction of new concepts and new tools on the market, this time also initiated this breakthrough, and the rest of the universities quickly adapted to the trend.Thus, nowadays at nearly every technical university there are series of courses dealing with data science and machine learning.Usually, most such courses are focused on a specific area or application of data science, which is useful as a tool in solving problems in a specific scientific area.But there are also introductory courses in data science and machine learning at nontechnical universities, with a much more general focus and combining knowledge from multiple areas.Such examples can be seen in Hicks and Irizarry (2018), where a very successful intro ductory course for data science has been implemented as a graduate course in the School of Public Health in Boston, MA, in (Brunner and Kim, 2016), where re searchers at the University of Illinois have developed an online introductory data science course for undergraduates from diverse disciplines, and in Delibašić et al. (2012), where decision tree algorithms were taught in a course on data mining at the School of Business Administration at the University of Belgrade.These courses are of great importance for the formation of the pedagogical content for the lower levels of education, because their purpose is to introduce the listeners in the initial stage of understanding DS & ML to their power, their usability, and the challenges faced by a typical scientist trying to solve problems using these concepts.
Recently, there have been active efforts to promote data science and machine learning at high school level as well (Martins and Gresse Von Wangenheim, 2022).At the European level, Heinemann et al. (2018) have designed and realised a pilot course on data science and big data for grades 11-12 (ages 16-17), at two local gymnasiums in the city of Paderborn, Germany.The curriculum introduced students to the basics of statistics, big data, data collection and visualization, and machine learning using decision trees as a "white box" approach and neural networks as a "black box" approach.The course also had students work on real data projects in small groups in cooperation with a local company, and discuss the societal aspects and impacts of machine learning, artificial intelligence, and data science in general.
In the USA, Gould et al. (2016) have developed and realised a yearlong course titled "Mobilize introduction to data science curriculum (IDS)" at the Los An geles Unified School District, which used participatory sensing, a data collection paradigm developed to create different communities whose members collect and analyze data together (Burke et al., 2006).Data was collected via mobile de vices and both the data and the analysis were shared within the community.The course was built around an "inquiry-based" approach to pedagogy, used a vari ety of open-source software, and focused on data collection, working with data, informal inference, and modelling and prediction.
The "Learn to Machine Learn" (LearnML) project2 was an Erasmus+ project targeting innovative solutions for teaching and learning computational thinking, AI and ML (LearnML, 2020).The project implemented a game-based learning toolbox and a complementary teaching and learning material for course develop ment using game-based ML activities, targeted at both primary and secondary education, and emphasizing the ethics and threats of AI, data biases, and societal implications.

General Examples of DS & ML Education in Vocational Education and Training
Teaching machine learning and data science in vocational education and training (VET) is as important as teaching it at any other level.Vocational education is not considered to be a mere high school alternative for students who would not continue on to university anymore, but is rather rightfully seen as a career and technical education which helps students learn specific skills at a young age that prepare them to work in a particular field.With the aforementioned rise of demand for qualified and skilled professionals in data science and in the IT industry in general, the implementation of DS & ML courses and curricula in VET is more important than ever.The data market has needs of many different job profiles from different subfields and with various levels of qualifications, so offering the option of DS & ML education to young people who have chosen vocational education broadens their future career choices and serves the industry at the same time.
Research has shown some shy beginnings of implementing DS & ML curricula in VET and of setting up basics in this direction (Wu, 2021).Only a few scattered well documented instances can be found, which is nevertheless a trend that must be kept and improved.Here a few such examples are described, and a specific and ambitious ongoing project for advancing machine learning in vocational education is also presented.
At the Israel Institute of Technology (Mike et al., 2020), a two-level data sci ence and machine learning program for 10th grade computer science pupils has been developed and integrated into the current official high school computer sci ence curriculum in Israel.The basic level is taught in the 10th grade with the goal of developing a project in Python as part of the lab-based learning unit in which students are exposed to proper programming concepts.The extended level is taught in the 11th and 12th grades and is currently under development.It elab orates on both the data science process and machine learning algorithms, with emphasis on deep learning.
The data science curriculum is based on the data life cycle.In order to cover the breadth of the curriculum, students are motivated to collect, clean, explore and model relevant data.The depth of the curriculum includes the complex machine learning algorithms which are often elsewhere skipped due to the limited math ematical and computational backgrounds of the students.Eventually pupils just learn to use the algorithms.
The curriculum includes four machine learning algo rithms: K-Nearest neighbors (KNN), perceptron, support vector machines (SVM), and neural networks (NN).The first two algorithms are simple to understand and the last two algorithms are useful in many academic and industrial applications.Concurrently with the algorithms, pupils learn about data types, starting with images and tables since they are the most visually representative data type and thus, easier to understand.
After the pilot program's start, curriculum designers introduced a final project to help strengthen the gathered knowledge.The pupils are asked to propose their own idea for the project topic, a project for which they have access to data and, above all, a project they can solve.Researchers concluded that class discussions about the topic selection process of each pupil were beneficial, even though two or more iterations of proposing ideas were necessary.Having in mind the diversity of each pupil's choice of project, grades were awarded for understanding of the problem and the underlying concepts and steps required to process the data and accomplish the project.
The software stack used in this project is the classical python, numpy, pan das, scikitlearn, except for the first two algorithms, KNN and perceptron, where students write their own implementation.
Recently, the SEnDIng project3 has designed and delivered a learning outcomes oriented multi-disciplinary VET curriculum, with data science as one of its main focus areas.The curriculum also focuses on Internet of Things and Transversal Skills, the latter being identified to be among the key skills of future employees (Jose and Serpa, 2018).The curriculum has been divided into modules targeting three levels of proficiency: introductory, core, and advanced.The course duration for each level ranges from 1-2 h for the introductory courses, 3-10 h for the core courses, and up to 5-10 h for the advanced courses.The lectures have been made available in the form of online courses with some 100 h of video content.The original program also included 20 hours of in-person training on transversal skills and 4 months of work based training.The curriculum gives emphasis on what an individual should know and be able to do at the end of the learning process.The main target group of the curriculum are IT professionals with working experience, or graduates from higher education institutions, corresponding to the European Qualifications Framework (EQF) level 5.The SEnDIng project has also built a reference model for the vocational skills, e-competencies and qualifications for data science that is compliant with the European eCompetence Framework (eCF) and the European Skills, Competencies, Qualifications and Occupations (ESCO) classification.

VALENCE DS & ML VET Survey
The Erasmus+ KA202 project VALENCE -Advancing machine learning in vo cational education4 (Zlatinov et al., 2021) is focused on developing a curriculum and an inte-grated free and open-source software platform for teaching machine learning and data science.The project will deploy and test out this curriculum at three VET high schools that are part of the VALENCE consortium: the Kranj School Centre in Kranj, Slovenia (SL), the Electrical Engineering School "Mihajlo Pupin" in Novi Sad, Serbia (SR), and the Vocational High School "Ilinden" in Skopje, North Macedonia (MK).The final and main objective of the project is to support the uptake of digital teaching and learning approaches and technologies when designing curricula for machine learning and data science in vocational ed ucation.The primary target audience of the project are students attending VET high-schools, and an online survey was designed (Zlatinov et al., 2021) with the objective of assessing the students' awareness of and experience with data science and machine learning to begin with.The survey was deployed to the students attending the aforementioned three VET high schools.It was designed in English, and it was then translated by native speakers into the languages of the partner VET High Schools: Slovenian, Serbian and Macedonian.The survey was deployed in May 2021 to 1,130 students of all levels and study profiles.There was a slight preference to distribute the survey to Computer Science students, as they were identified as high performers.The data was first preprocessed and then partici pants that gave one or more inadequate answers to the questions with a textual response were identified and their answers were eliminated from further analysis.This was done to maintain stronger validity of the overall analysis of the results.Nearly 25% of the participants were discarded in this way, leaving a total of 857 responses to the survey: 550 from participants from Serbia, 170 from North Mace donia, and 137 from Slovenia.
The survey comprised 72 questions in 8 sections: General questions -serve to obtain personal information about the student whilst 1.
preserving their anonymity, and include: age, sex, school and year of studies, specialization, and grade average.Awareness of DS & ML -a series of 6 questions to assess if the student has heard 2.
of or used DS or ML, do they grasp the usage potential of DS & ML, and do they know how to define them in their own words.
Contact with, and exposure to DS & ML -12 questions that assess the frequency 3.
and continuity of the student's use of modern IT platforms, including social media, video streaming services, video games and communication apps.
Interest in DS & ML -5 questions to quantify the interest of the student in learn-4.
ing data science, machine learning, and statistics, their expectation re garding the difficulty of the areas, and their readiness to learn how to use these tools in particular projects, as well as 2 questions assessing the student's interest in the fields of particular applications of DS & ML.
Experience with DS & ML -2 short questions allowing students to express any 5.
practical experience they might have had with DS & ML.
General IT and language skills -6 questions to assess the programming expe-6.
rience of the student, as well as their proficiency in English.
Learning preferences -6 questions for the evaluation and quantification of the 7.
perceived benefits and personal preferences of the use of various teaching meth-ods including traditional lectures, course books, homework projects, video lectures, online courses, interactive demos, and work on practical projects.Extras -a group of 27 miscellaneous questions on topics that include: cyber bul-8.
lying, sports, music, movies, art, and languages.
From the 72 questions, 45 (sections 1-7) directly relate to DS & ML, and 27 questions are in the Extras section.The latter were used as practical dataset examples in the process of the design and development of the VALENCE curricu lum5 .
As will be discussed in more detail in the following subsections, the results show that although these topics are increasingly present in our daily lives, the students in general do not have ample awareness and experience in the fields of DS & ML, even in VET high schools.Even if most students are aware of DS & ML, only a small portion of them have any practical experience with them or have followed an online tutorial.However, it is encouraging to see that a large portion of the students is interested in learning about them.This reaffirms the need for the design and deployment of an accessible DS & ML curriculum.In fact, their experience with Python shows that a large number of students already have good prerequisites for following such a course.

Demographic Distribution
Figures 1-3 show the demographic distribution of the students, i.e. age, sex, and study profile, both the total as well as the ones separated by high school (country).It can be seen that the students are aged 14-20, with most of them being 15-17 years old.They are predominantly male, which is not surprising for VET schools in the region.Interestingly, there is a clear difference in the number of female students across the three countries -female students make 23.5% of the students in North Macedonia, 8.76% in Serbia, and only 2.26% in Slovenia.Regarding their study profiles, almost 50% of students are Computer Science students, followed by Electronics and Automatics, except in North Macedonia where a more balanced distribution can be found.This is in line with the preferences in distributing the survey, but also reflects the number of students in the different study profiles.

Awareness of DS & ML
The level of awareness of DS & ML was gauged by asking "Where did you hear about ... ?" for DS & ML separately.Fig. 4 shows the analysis of the results.It can be seen that, even though these subjects are not part of their curricula, most students have already heard about their existence, with ML being the some what more familiar term.This reflects the omnipresence of these subjects in the students' daily surroundings.Despite these encouraging results, still some 40% of the students have not yet heard about ML or DS, validating the need to have these topics included in the curriculum.As expected, the Internet is the dominant source of information over all other sources of information in the chart.When asked about the definition of the terms DS & ML, two patterns emerge in the students' answers.The first one is the classic "Data science is the science about data".Interestingly, the second pattern is the first paragraph of the respective Wikipedia page on data science: "Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains."6

Experience with DS & ML
Delving deeper, the students were also asked whether they have used DS or ML in a personal project.Only a negligible handful of students responded affirmatively with the top three listed answers being neural networks, chatbots, and image processing.Fig. 5 shows how many students have followed an online tutorial about DS or ML, with Slovenian students having the largest experience.The most interesting topics of the tutorials were general programming, robotics, and ML applications.However, upon closer analysis, almost a half of the provided responses clearly show that the students are still largely unfamiliar with what constitutes DS & ML, and have a hard time drawing the line between them and classical engineering and programming.

Readiness for DS & ML
To indirectly evaluate the readiness for practical work in DS & ML, students were asked whether they have any experience writing code in Python.Fig. 6 shows the results, and also reveals that the Slovenian students are ahead in Python pro gramming experience.In contrast, the use of Jupyter Notebooks is level at 5% in all three high schools.Speaking of general programming experience, Slovenian and Serbian students are much more versatile with many of the students ticking two or more programming language checkboxes.On the other hand, Macedonian stu dents have exclusively and in large numbers marked themselves as knowledgeable in C++, reflecting the focus on this programming language in the Macedonian curricula.

Interest in DS & ML
Next, the students were asked to express their curiosity and interest about learning DS, ML, and statistics, through questions with possible answers between 1 (not interested at all) and 10 (very much interested).Fig. 7 shows that interest in these areas is varied.The mean expressed interest, as can be seen in Table 1, is around 4.4/10, and differs slightly among the three areas.ML is the most interesting with more than 50% of students expressing interest at 5 and above, while DS and statistics are slightly less interesting.In Fig. 7 we can see that the largest group of students, almost 20%, expressed their interest about learning DS & ML in the middle, choosing 5.There are also pronounced groups giving positive scores of 7 and 10.It should be noted however, that even though there are a large number of positive responses, there is a significant portion of students (around 17%) that would not want DS, ML or statistics included in their curricula at all.This is rather discouraging, but might correlate with the students that feel negative about the education process in general.To assess if the students see DS & ML as something too complex to be learned at school, they were asked "Do DS and ML seem hard to learn?".The answers ranged from 1 (not at all) to 10 (very hard), and the results are shown in Fig. 8.Although there is a significant portion of students who maybe underestimate the task by answering with 1, the rest of the answers resemble a normal distribution.Most of the students didn't feel that the material would be too hard to learn at school.The students' particular interests in the most popular applications of DS & ML were also queried.The answers varied but show that students mostly exhibit interests in ML applied to robotics, image processing, and speech recogni tion.With regard to DS applications, the majority of the answers are uniformly distributed between business intelligence, digital advertising, and internet search.

Learning Preferences
The learning preferences of the students are shown in Fig. 9. Live lectures are the preferred method of learning in all three high schools, while the next four preferred learning methods are almost uniformly distributed among course books, online lectures, videos, and lecture notes.This seems to indicate that students favor the conventional methods of teaching and learning.

Continuing to University
Lastly, it was interesting to assess the willingness of the students to continue their education at university level.The results in Fig. 10 show that a majority of the VET high school students do plan to enroll at university, with Macedonians ahead of the rest with an interest of almost 88%.This is a thought-provoking result as VET high schools are purposed to directly equip students with technical skills required for the job market, and yet the majority of them still seem to see universities as a natural continuation of their education process before entering the workforce.

Summarization
The results of the survey can be briefly summarized in order to underline the main insights and implications on teaching DS & ML in VET.The 857 students from VET high schools that were surveyed were predominantly in the age group 15-17 and came mostly from 5 major study profiles (computer science, electronics, auto matics, telecommunications, and power engineering).The three surveyed schools are among the largest VET high schools in the respective countries and the age groups, gender ratios, and study profiles were extremely representative of the gen eral real world situation.Furthermore, the students weren't told of the purpose of the survey and all responses containing at least one inadequate or nonsensical answer to any question (e.g.jokes, spam, etc.) were discarded, thus reducing the threats to internal validity.The majority of the students were aware of DS & ML and had heard about them mainly through the Internet.They are however inexperienced when it comes to working with DS & ML or following some class or tutorial, but it seems that there is a good basis for work and study in this field since a substantial percentage of the students have some programming experience, and especially in Python.Furthermore, some encouraging numbers show that a lot of the students would be interested to study machine learning, data science, and statistics (represented by the selected answers larger than or equal to 5 when students were asked to mark their interest on a scale from 1-10).Similarly, most of the students don't fear the difficulty of such subjects too much, and would mostly prefer to use conventional methods and materials for learning (live lectures, course books, and lecture slides) over online lectures, videos, or demos.Interestingly, the majority of the students also plan to further their education at university level.The main takeaway seems to be that students are eager and ambitious to broaden their knowledge in the fields of DS & ML and they seem to be aware of their growing significance in the world and on the job market.The preconditions for the deployments of machine learning and/ or data science curricula at the VET level also seem to be present since a large portion of the surveyed students already possess the necessary skills (i.e. interest and experience in programming) to be able to follow such courses.

Conclusion
This paper shows the current global practices of teaching data science, machine learning, and artificial intelligence at different education levels, with a focus on vocational education and training.A general overview of the trends was given, and many interesting and illustrative examples were identified and described.The results of this review show the growing importance of this subject and further outline the needs for a standardized approach to teaching these subjects on all education levels.This especially applies to vocational education and training pro grammes, which can give young people a fast career path and direct access to the data and IT industry and help satisfy the ever expanding needs of that market for qualified and skilled workforce.
The presented survey results from the VALENCE project also stress this im portance, especially considering that many students have great enthusiasm and prerequisites about being taught data science and machine learning but very little awareness and experience, at least in the surveyed countries.Therefore, the devel opment of accessible DS & ML curricula is essential to provide young people with the education and career opportunities they want and deserve.The data and IT industry must also accept this reality and look into playing a more active role in the entire process.B. Gerazov is an associate professor at the Faculty of Electrical Engineering and Information Technologies (FEEIT), Ss.Cyril and Methodius University in Skopje, Macedonia, where he completed his PhD (2014), MSc (2011) andBSc (2007).His main fields of interest are speech technology, biomedical engineering and more broadly machine learning and digital signal processing.He leads the Speech Group at FEEIT, and serves as chair of the IEEE joint chapter of Dig ital Signal Processing and Engineering in Medicine and Biology in the Macedo nian IEEE section.Branislav was a Marie Sklodowska-Curie visiting researcher at GIPSA-lab, Grenoble-INP, France.He has visited and worked with the Institute of Biophysics and Biomedical Engineering at the Faculty of Sciences, University of Lisbon, Portugal.He is also one of the founders and main activists in the Interest Group for Free Software, Open Science and Education -SO@FEEIT.S. Zlatinov is a researcher in the field of robotics.He is currently pursuing a PhD degree in autonomous vehicles at the Faculty of Electrical Engineering and Information Technologies, in Skopje, where he has been actively involved in relevant robotics-related projects and robotics coursework.Stefan also works as a teaching assistant at the same institution.Through these experiences, he has developed a passion for many topics in the robotics field, including robot manipulators, mobile robots, simulation, and the robot operating system -ROS.

Fig. 2 .
Fig. 2. Sex distribution of the participants in the survey.

Fig. 3 .
Fig. 3. Study profile distribution of the participants in the survey.

Fig. 4 .
Fig. 4. Awareness of DS & ML of the participants in the survey.

Fig. 6 .
Fig. 6.Experience in Python of the participants in the survey.

Fig. 7 .
Fig. 7. Interest (1-10) of the surveyed students in learning data science, machine learning and statistics.

Fig. 9 .
Fig. 9. Learning preferences among the surveyed VET high school students.

G.
Nadzinski is an associate professor at the Ss.Cyril and Methodius Univer sity, Faculty of Electrical Engineering and Information Technologies in Skopje, where he has worked in different positions at the Department of Automation and System Engineering since 2011.He received his Ph.D degree in 2018, his M.Sc degree in 2013, and his B.Sc degree in 2011, all in System Engineering and Au tomation at Ss. Cyril and Methodius University, Faculty of Electrical Engineering and Information Technologies in Skopje.His research interests include data sci ence and machine learning, robotics, industrial control and process automation, and networked control systems.

T.
Kartalov is an associate professor at the Ss.Cyril and Methodius Univer sity, Faculty of Electrical Engineering and Information Technologies in Skopje, Department of Electronics.He received his Ph.D degree in 2014, his M.Sc degree in 2008, and his B.Sc degree in 2002, all in Electronics and Digital Signal Process ing at Ss. Cyril and Methodius University, Faculty of Electrical Engineering and Information Technologies in Skopje.His research interests include many fields of digital image and video processing, using both traditional signal processing ap proach as well as machine learning algorithms.Some of the interests are image reconstruction, image fusion, video coding and compression, robotic vision, im age and video classification, video surveillance, traffic analysis, event detection in video, and others.M. Markovska Dimitrovska received her Ph.D. degree in electrical engineering from Ss. Cyril and Methodius University, Skopje, North Macedonia, in 2020.Since 2014, she has been with the Department of Electronics, Faculty of Electrical Engi neering and Information Technologies, Ss. Cyril and Methodius University, Skopje, where she is currently an Assistant Professor.Her research interests include data science and machine learning, signal processing and power quality.H. Gjoreski is an Associate Professor at the Ss.Cyril and Methodius University in Skopje, Macedonia.He finished his PhD in 2015 at the Jozef Stefan Institute in Slovenia, and was postdoctoral researcher in 2017 at University of Sussex, UK.His scientific and research experience is in the domains of Machine Learning and applied Artificial Intelligence.He has participated more than 12 international projects, and he is currently the coordinator of the WideHealth European H2020 project.He established and is organizer of the Data Science Macedonia group, with more than 1000 members.R. Chavdarov is a system engineer at the Faculty of Electrical Engineering and Information Technologies in Skopje.He has more than 10 years of experience in information technologies, especially in cloud systems, web technologies and computer security.During his career, he worked on the design and implementation of various web systems.His research interests are network and cloud technology, artificial intelligence, machine learning and IoT.Z. Kokolanski received B.Sc. degree in electronics and telecommunications, M.Sc.and Ph.D. degree in electrical measurements from the Ss.Cyril and Method ius University in Skopje, Macedonia, in 2007, 2010 and 2013 respectively.Since 2007 he is with the Department of Electrical Measurements, Faculty of Electri cal Engineering and Information Technologies in Skopje, where he is currently an Associate Professor.He is an author of more than 100 papers published in inter national conferences and journals, and holds two national patents.His research interests include electronic instrumentation, sensor interface circuits, multi-sensor systems and virtual instrumentation.

Table 1
Statistics from the survey regarding the interest between 1 and 10 in learning data science, machine learning and statistics