Machine Learning Methods and Qualimetric Approach to Determine the Conditions for Train Students in the Field of Environmental and Economic Activities

— The relevance of environmental and economic activity requires professional training of specialists and, accordingly, new organizational and pedagogical conditions for effective education. It is also necessary to develop control and measuring materials that would have all the qualities (validity, reliability, consistency, significance and objectivity) to obtain the most reliable results in justifying the need and sufficiency of the identified conditions. The intensification of information processes in vocational education leads researchers to the need to find optimal conditions and tools to achieve pedagogical goals. Among these tools are machine learning methods and mathematical models built on their basis for quantitative assessment of the quality of vocational training in the field of environmental and economic activities. The use of the qualimetric approach in pedagogy is possible in the presence of a certain array of observational data for one or another criterion related to learning conditions, personal qualities of students, etc. The construction of an algorithmic model allows one to operate with conditions in mental experiments, test hypotheses, and since pedagogical research is quite long in time, the choice of conditions based on the most favourable forecast built using the model allows one to optimize pedagogical resources to achieve the planned results. Rational selection of effective control and measuring materials (CMMs) allows one to determine the need and sufficiency of organizational and pedagogical conditions. While mathematical modelling allows one to quickly adjust the organizational and pedagogical conditions as a set of opportunities for content, forms, teaching methods, information


Introduction
Recently, the issues of sustainable development of society and ensuring its environmental and economic security have become more and more acute in society [1,2]. The essence of which is the development of environmental and economic relations in favourable directions for nature and society, ensuring economic wellbeing, high-quality living conditions and human health [3][4][5]. In order to prevent the development of unfavourable environmental and economic situations in industrial and other spheres, professional education has been assigned the task of forming competence in the field of environmental and economic activities among specialists of various industries, which is one of the components of the environmental and economic orientation of an individual [6][7][8][9].
The concept of the Seventh Action Plan of the European Union on the Environment is based on the priority application of the direction "Environmental Technologies" for the innovatively sustainable creation of a green knowledge economy with the achievement of synergies. As a result, in response to the challenges of our time, pedagogical science is taking measures to determine the methods and forms of innovative improvement of the training of specialists with the effective formation of competencies that allow the implementation of environmental and economic activities [10]. Among the key tools in the training process are control and measuring materials (hereinafter -CMMs). The latter have not only indicative properties that demonstrate the presence of certain qualities of students, but are also full-fledged didactic means, since the stage of monitoring of learning allows one to fix personal results and correct the process or complete the stage of their formation in a particular subject area [11][12][13]. In light of this approach, CMMs are an essential component of determining the need and sufficiency of organizational and pedagogical conditions for effective professional training. Moreover, it should be noted that the set of CMMs of the current and final assessment for each specific academic discipline itself is a full-fledged organizational and pedagogical condition that ensures the formation, development and actualization of socially and professionally significant qualities of students [14].
Since the totality of organizational and pedagogical conditions is a system [15], to study its properties, it is necessary to present it sequentially in four categorical plansprocesses of any kind, functional structure, organization of the material, morphology, and then -to lay out a plan of morphology again for all the above plans and continue this procedure until the specific representation of the object is achieved [16]. Consideration of CMMs in such an epistemological aspect as an indicative element of the pedagogical system will reveal the weight of organizational and pedagogical conditions in the resulting indicator of training, in the context of present research -an integral indicator of the formation of competence in the field of environmental and economic activity.
The purpose of this study is to identify resources for the express-finding of optimal conditions and ways to achieve pedagogical goals based on mathematical modelling and quantitative assessment of the quality of vocational training of students in the field of environmental and economic activities.
• Research objectives: ─ Qualimetric assessment of the contribution of individual organizational and pedagogical conditions to the final competence of a graduate of an educational organization based on machine learning algorithms; ─ Express determination of the most significant organizational and pedagogical conditions in terms of the formation of the competence of a graduate of an educational organization in the field of environmental and economic activity; ─ Development of a mathematical project model that makes it possible to make predictive expectations of the studied competency types and test hypotheses for the inclusion or exclusion of certain significant organizational and pedagogical conditions for the effective implementation of the educational process.

Materials and Methods
It is advisable to carry out a rational selection of effective CMMs to determine the need and sufficiency of organizational and pedagogical conditions through an expert assessment of the CMMs content in the disciplines of professional modules, identifying the most significant thematic blocks for the formation of specific key skills in the field of environmental and economic activities. It should be noted that the selection of the experts themselves for such a procedure should be carried out in accordance with the observance of special requirements and algorithms, since the formal characteristics of candidates for experts, such as specialty, academic degree, work experience, etc., do not always allow selecting a truly professional focus group, this requires standardized procedures for selecting experts [17][18][19][20]. In turn, to assess the contribution of a particular organizational and pedagogical condition to the final competence of a graduate of an educational organization, it is possible to use CMMs based on machine learning algorithms [21].
Organizational and pedagogical conditions are understood as the totality of the possibilities of content, forms, teaching methods, ICTs and applied CMMs for the effective implementation of the educational process. At present, the content and results of professional training (including programs chosen for this research (44.03.04 and 44.04.04)) are regulated by the Federal State Educational Standards [22], which provide certain variability when defining disciplines of a professional course. When preparing the initial data at the first stage of the planned research, a set of most significant disciplines (in terms of the formation of a graduate's competence in the field of environmental and economic activity) has been determined. The expert group selected academic disciplines in such a set that contributed to the formation and development of the studied competence, namely: 1. "Basics of Information Security" (X0); 2. "Jurisprudence" (X1); 3. "Applied Economics" (X2); 4. "Organization Management" (X3); 5. "Economic Theory" (X4); 6. "Fundamentals of Life Safety" (X5); 7. "Ecology" (X6); 8. "Information Technologies in Management" (X7); 9. "Fundamentals of Scientific Research" (X8); 10. "Mathematical Modelling" (X9).
It should be noted that all these disciplines had the same workload -72 hours (2 credit points).
Justification of the forms and content of CMMs to determine the need and sufficiency of organizational and pedagogical conditions depends on these conditions, since the assessment for each specific discipline is formed precisely by means of CMMs. An individual rating (level of knowledge and skills) of a student is formed in a balanced scorecard for assessing progress in the academic discipline based on the results of performing research tasks, essays, etc.
At the next stage of the research, by formalizing the research task, the initial data are presented through the feature description. At the same time, it is assumed that X is a set of descriptions of characteristics (features) of an organizational and pedagogical condition, expressed quantitatively. Students' balanced scorecard is actually a set of CMMs and the general coefficient of competence formation, expressed quantitatively as a percentage. That is, the quantitative expression of the elements of the set X is the assessment of student's current performance in a scorecard for a particular discipline, and the general response Y will be the final indicator of a graduate's competence formation, obtained during the final certification.
To test the algorithms of machine learning methods to optimize research and determine the need and sufficiency of organizational and pedagogical conditions, the following assumption was made. Namely, there is a relationship between knowledge and skills in a particular discipline and the final indicator of the competence formation in the field of environmental and economic activity, that is, the following is assumed: Х → У. The task is to create a model for determining the most significant factors of the dominant influence of organizational and pedagogical conditions on the maximum value of the general response Y. The created model was adjusted according to a sample of indicators of competencies formation in 174 graduates (Fig. 1). In the process of designing the model, it was decided to use a cloud service for the development and analysis of data -Google Collaboratory [23] and the Python programming language with loading the necessary library databases: # connecting required modules import pandas as pd from sklearn.ensemble import RandomForestClassifier, Gradient BoostingClassifier from sklearn.model_selection import GridSearchCV import numpy as np from matplotlib import pyplot as plt from sklearn.model_selection import train_test_split, cross_val_score from sklearn import metrics from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import confusion_matrix.
Then the quantitative variable is transformed into types according to the accepted order of dividing the general response into types 1.0 (advanced), 2.0 (intermediate) The distribution of the target response by types and the total number of observations are presented in Table 1. Further, to take into account the relationships between the characteristics of the organizational and pedagogical conditions, the generation of additional characteristics was programmed: # generation of additional characteristics from sklearn.preprocessing import PolynomialFeatures poly = PolynomialFeatures(2) poly.fit_transform(X) poly = PolynomialFeatures(interaction_only=True) X = poly.fit_transform(X) For the selected test sample (20% of the total amount of data in the original sample), the values of its features were reduced to the interval from 0 to 1. To ensure the balance of the test sample before the subsequent creation of model variants (differing in the level of mean harmonic parameters of the quality metrics 'f1_micro' in the completeness of classification and its cross-validation accuracy) the alignment parameter class_weight ='balanced' was used.
Based on the balanced data of the test sample using the Grid_Search method [25], two versions of mathematical models of the final formation of competence in the field of environmental and economic activity were built -the Random Forest Classifier model and the Gradient Boosting Classifier model:

Results and Discussion
The classification report for each of the ranking types of the balanced test sample presented in Table 2 indicates the following: • Random Forest Classifier model poorly classified advanced 1.0 ranking type for which the characteristic f1-score = 0.25 and its use guarantee the degree of classification on average with an accuracy of 63% of the test sample cases; • Gradient Boosting Classifier model guarantees the degree of classification with a minimum accuracy of 72% of the test sample cases with a relatively equal classification level for all ranking types 1.0, 2.0 and 3.0. Error matrices for the cases of applying the mathematical models Random Forest Classifier and Gradient Boosting Classifier are shown in Fig. 2. (a) and Gradient Boosting Classifier (b) models The Gradient Boosting Classifier model based on the initial data of the performed pedagogical research has a classification degree with a minimum accuracy of 72% of the test sample cases with a relatively equal classification level for all ranking classes 1.0, 2.0, 3.0 and with fewer errors than the Random Forest Classifier model.
To determine the significance of the first 30 dependent variables (which were split by sample type by the Gradient Boosting Classifier model), a histogram (Fig. 3)   The interpretation of the derived features in the context of the pedagogical research performed with a cross-validation accuracy of 72% is relatively approximate, but the dominant significance of intersubject connections is obvious between the following disciplines: • X3 and X5 ("Organization Management" and "Fundamentals of Life Safety") • X4 and X9 ("Economic Theory" and "Mathematical Modelling") • X2 and X6 ("Applied Economics" and "Ecology") • X0 and X2 ("Basics of Information Security" and "Applied Economics") Based on the above data, a necessary and sufficient organizational and pedagogical condition for the effective formation of competence in the field of environmental and economic activity is to ensure continuity between significant disciplines and the actualization of interdisciplinary connections based on the development of interdisciplinary courses. Thus, for the studied sample of students in the areas of undergraduate and graduate programs 44.03.04 and 44.04.04 "Professional training (by industry)", respectively, such courses can be: "Ecology of management", "Mathematical modelling of social and economic processes", "Ecological economics" and "Cyber economics".
The report on the panel discussion of the AECT-Korean Society for Educational Technology (KSET) generally reflects current trends in the field of educational technologies in Korea and the United States in terms of collecting data in education to predict and classify student achievements in educational institutions [26]. According to recent trends in electronics, ICTs (RTEICT), machine learning helps predict student performance, including in terms of pedagogical personalization technologies for interactive pedagogical systems [27][28][29][30]. To improve the accuracy of predicting pedagogical data in the works of foreign scientists, the following methods are used: • Ensemble StackingC • Basic classifiers • Validation based on pedagogical model data -the case of a mixed LCM model • Transactions in neural networks and training systems • Pedagogical assessment in the era of machine-mediated learning • Applying relevant academic and personality characteristics from large unstructured data to identify good and poorly suited students • Built-in regression of fuzzy clustering to predict student performance [31][32][33][34] The present study results, in comparison with similar studies by foreign authors, differ in: • Its focus on express determination of the most significant organizational and pedagogical conditions in terms of the formation of the graduate's competence in the field of environmental and economic activity • Observance of the most general principles of increasing the accuracy of forecasting pedagogical data and similar principles of clustering students' progress This approach expands the scope of optimization of pedagogical resources to achieve the planned results.

Conclusion
Machine learning methods can be applied, first of all, to substantiate the necessity and sufficiency of organizational and pedagogical conditions for the effective implementation of the educational process. The designed mathematical model of the Gradient Boosting Classifier, in comparison with the mathematical model of the Random Forest Classifier, makes it possible to make more accurate predictive expectations of the types of competencies (in this study -competence in the field of environmental and economic activity) with a forecast accuracy probability of 72% with a small range of errors. The Gradient Boosting Classifier model also allows one to make adjustments to the educational process, if necessary, and to test hypotheses for the inclusion or exclusion of certain significant organizational and pedagogical conditions for the effective implementation of the educational process. A necessary and sufficient organizational and pedagogical condition for the effective formation of competence in the field of environmental and economic activity among students of the undergraduate and graduate programs 44.03.04 and 44.04.04 is to ensure continuity between the most significant disciplines of this educational direction. It is also important to actualize interdisciplinary connections based on the development and implementation of such interdisciplinary courses as "Management Ecology", "Mathematical Modelling of Socio-Economic Processes", "Ecological Economics" and "Cyber Economics", etc.
From the standpoint of the theory of reliability, the process of making optimal decisions in conditions of even partial ambiguity and uncertainty of environmental and economic problems is inherently multifaceted, multifunctional, requiring special skills and abilities, taking into account the influence of technical and information progress, the introduction of digital technologies. The adoption of measures even to level the existing environmental and economic imbalance in business activities, as noted by the authors of the study of the international company BCG (Boston Consulting Group), can provide a GDP growth of 0.5-2% [36]. At the same time, the most important positive component of qualitative changes is the intellectualization and development of human capital of all participants in production processes based on a scientifically grounded approach in making optimal decisions on the relevant formation of their professional competencies. For this, it is important to improve algorithms for mathematical modelling of machine learning and qualimetric determination of the conditions for training students in the field of environmental and economic activity.