Application of machine learning and rough set theory in lean maintenance decision support system development

Maintenance processes have a significant impact on manufacturing companies such as: production efficiency, safety and environment requirements and customers satisfaction [19, 23, 28, 38]. In addition, delivering high-quality products with tighter tolerances and lower waste and rework levels also depends on well-maintained equipment, which is another reason to develop more efficient maintenance processes [39]. Moreover, Marksberry [52] determined as the waste of production process the ‘maintenance of machines and devices’. Various concepts have been used to decrease reliability and availability of machines and devices, one of them is Lean Maintenance (LMn) [29]. LMn deals with the integration of people in the production process, using certain methods and tools for continuous improvement, as well as the elimination of waste in value-added activities. The complexity of various LMn tools and methods as well as the investment costs make the LMn implementation a difficult and complex process, although this concept has an impact on the business results of the organization [10]. The problem of inadequate understanding of the relationship between LMn and the operating environment of manufacturing companies causes the LMn implementation to fail [17]. Therefore, an important aspect is the development of systems supporting the assessment of the effectiveness of LMn implementation. [91]. The aim of the article is to develop a decision support system, which will be helpfull for decision-makers from companies in selecting appropriate LMn methods and tools that have the greatest impact on the company’s operational results. In the proposed decision making system the machine learning methods and rough set theory was used. The main research question was: Which of the LMn tools had the greatest impact on reducing the number of unplanned downtime? The remainder of this paper is structured as follows. In Section 2 the literature review according the importance of maintenance function in manufacturing and lean maintenance is presented. Then, in Section 3 the research methodology is presented. In Section 4 the results of using decision trees and rough set theory to generate categorization models in the assessment of the implementation of lean maintenance are presented. Finally, the conclusions and direction of the future research are presented.


Introduction
Maintenance processes have a significant impact on manufacturing companies such as: production efficiency, safety and environment requirements and customers satisfaction [19,23,28,38]. In addition, delivering high-quality products with tighter tolerances and lower waste and rework levels also depends on well-maintained equipment, which is another reason to develop more efficient maintenance processes [39]. Moreover, Marksberry [52] determined as the waste of production process the 'maintenance of machines and devices'. Various concepts have been used to decrease reliability and availability of machines and devices, one of them is Lean Maintenance (LMn) [29]. LMn deals with the integration of people in the production process, using certain methods and tools for continuous improvement, as well as the elimination of waste in value-added activities.
The complexity of various LMn tools and methods as well as the investment costs make the LMn implementation a difficult and complex process, although this concept has an impact on the business results of the organization [10]. The problem of inadequate understanding of the relationship between LMn and the operating environment of manufacturing companies causes the LMn implementation to fail [17]. Therefore, an important aspect is the development of systems supporting the assessment of the effectiveness of LMn implementation. [91].
The aim of the article is to develop a decision support system, which will be helpfull for decision-makers from companies in select-ing appropriate LMn methods and tools that have the greatest impact on the company's operational results. In the proposed decision making system the machine learning methods and rough set theory was used. The main research question was: Which of the LMn tools had the greatest impact on reducing the number of unplanned downtime?
The remainder of this paper is structured as follows. In Section 2 the literature review according the importance of maintenance function in manufacturing and lean maintenance is presented. Then, in Section 3 the research methodology is presented. In Section 4 the results of using decision trees and rough set theory to generate categorization models in the assessment of the implementation of lean maintenance are presented. Finally, the conclusions and direction of the future research are presented.

The importance of maintenance function in manufacturing
Modern manufacturing companies focus on the availability, reliability and productivity of their manufacturing machines and devices [39,84]. Equipment maintenance and system reliability are important factors that have impact on the ability to provide quality and timely products to clients, comply with legal requirements, and meet business goals. These needs have placed the maintenance function in the Lean maintenance concept is crucial to increase the reliability and availability of maintenance equipment in the manufacturing companies. Due the elimination of losses in maintenance processes this concept reduce the number of unplanned downtime and unexpected failures, simultaneously influence a company's operational and economic performance. Despite the widespread use of lean maintenance, there is no structured approach to support the choice of methods and tools used for the maintenance function improvement. Therefore, in this paper by using machine learning methods and rough set theory a new approach was proposed. This approach supports the decision makers in the selection of methods and tools for the effective implementation of Lean Maintenance.
A approach with rough set theory and decision • tree.
Rough set theory with different types of algo-• rithms selected for predictive models.
The classification model for lean maintenance im-• plementation assessment.
Application of machine learning and rough set theory in lean maintenance decision support system development spotlight as a strategic function for manufacturing companies [54,58,78,79]. As defined by European standard EN 13306, maintenance is a "the combination of all technical, administration and management actions during the life cycle of an item intended to retain it in, or restore it to, a state in which it can deliver the required function (function or a combination of functions of an item which are considered necessary to provide a given service)." The presented definitions express the multidisciplinary character of maintenance operations, which include both technical aspects of the technical facility performance and all inservice aspects, referring to the facility itself and to all stakeholders and resources engaged into maintenance processes. According to [66] "Maintenance operations are much like manufacturing operations where both employ processes that add value to the basic input used to create the end product" As maintenance management in a manufacturing company combines various functions (organizational and business) its implementation is complex and requires the utmost attention. According to [89] "maintenance is not just ensuring healthiness of equipment in a facility but it also plays a crucial role in achieving organization's goals and objectives with optimum maintenance cost and maximum production.
[…] and needs to be viewed as a strategic function in an organization". Defining an appropriate maintenance strategy is seen as a way to turn your company's goals into maintenance goals [89]. Maintenance objectives at strategic and tactical levels of the organization can be define in five categories [88]. First category is maintenance budget, which consists e.g. maintenance costs and maintenance value. In the second category functional and technical aspects such as: availability, maintainability, reliability, Overall Equipment Effectiveness (OEE), productivity, maintenance and output quality are described. Third category contains plant design life. Next category includes inventory of spare parts and logistics. Finally, people and environment are counted in the last category. To achieve this objectives maintenance strategies have evolved with the course of time, From reactive maintenance ("run-to fail" logic) to proactive maintenance (PrM) strategies such as: Preventive Maintenance (PM) or Predictive Maintenance (PD). The main goal of PrM strategies is to monitor the equipment and making minor repairs to keep them in the good condition with high performance. Research conducted by [32] shows that adopting predictive maintenance in an enterprise can minimize maintenance costs up to 30% and eliminate breakdowns up to 75% compared to preventive maintenance.
Today, maintenance with a strategic role in revenue generation is seen as source of added-value, with key role for driving performance improvement [51]. According to [37] "advanced practice in maintenance can play a role in achieving more competitive, responsible and sustainable performance in manufacturing companies." In this line maintenance should be view as an important function in achieving sustainability in manufacturing processes. Many researcher start to study the impacts and contributions of maintenance function to more sustainable operations in manufacturing companies. From the economic dimension of sustainable manufacturing four factors quality and productivity, delivery on time, innovation and cost are affected by the maintenance function [40,49]. From environment dimension of sustainable manufacturing most frequently prevention of environmental damage, emissions reduction and land conservation, energy consumption reduction and energy savings are underlined [60,72]. Finally, from the social dimension of suitability manufacturing are underlined the relationship of the maintenance function with its stakeholders within and outside the company, with a particular focus on the maintenance personnel, who is affected by decisions made in the maintenance department [22,41].
Manufacturing industry has now embarked on a digital transformation following the Industry 4.0 paradigm in which the maintenance organization is expected to play a key role in enabling robust autonomous systems [49]. According to [56], many companies consider maintenance processes improvement as the one of the initial stages towards Industry 4.0 concept.
The growing complexity of the production environment, new requirements and new opportunities force the maintenance managers to constantly search for opportunities to improve activities and processes. Dekker [27] stress that "the main question faced by maintenance management, whether maintenance output is produced effectively, in terms of contribution to company profits". Although this question was asked many years ago, it is still timely and is very difficult to answer. Many researchers and practitioners proposed models to solve maintenance-related problems and pointed out that successful implementation of these models depends on appropriate understanding and using properly tools and techniques indicated in this models.

Lean and maintenance
Lean Manufacturing (LM) is worldwide recognition methodology for the improvement of internal processes, popularised by the book 'The Machine that Changed the World' [15]. The main challenge of LM is to increasing customer satisfaction while decreasing waste and losses. The benefits of lean implementation are divided in two field. Firstly, LM eliminates wastes, decreases delivery, lead and cycle times, decrease inventories, and increase the productivity [11,45]. Secondly, LM improves the workers satisfaction, good communication, and decision-making process [25].
LM demand for a reliable and stable machine operation gave way to another concept -Lean Maintenance [82] also known as Lean TPM (Total Productive Maintenance) [55]. According to [82] "without a Lean Maintenance operation, Lean Manufacturing can never achieve the best possible attributes of Lean", so "first -Lean Maintenance, and next -Lean Manufacturing".
According to [77] "Lean production shifts the attention of maintenance improvement from the technical matters to the management side, which focuses on eliminating the root causes of problems through team-based decisions and implementation".
Smith and Hawkins [82] defined LMn as "proactive maintenance operation employing planned and scheduled maintenance activities through total productive maintenance (TPM) practices, using maintenance strategies developed through application of reliability cantered maintenance (RCM) decision logic and practiced by empowered (self-directed) action teams using the 5S process, weekly Kaizen improvement events, and autonomous maintenance together with multiskilled, maintenance technician-performed maintenance through the committed use of their work order system and their computer maintenance management system (CMMS) or enterprise asset management (EAM) system". This definition extends beyond the classic LM concept of TPM including a reliability approach based on the RCM method. It indicates the need to identify hazards, assess their consequences and on this basis, determine the criticality of technical facilities and appropriate maintenance activities for the function performed by the facility.
One of the main steps for improving the maintenance processes is to develop a system to identify VA (Value Added) and NVA (Non Value Added) activities and recognize the types wastes [76]. To achieve this LMn includes several tools and methods, such as: 5S, Value Stream Mapping (VSM), Single Minute Exchange od Die (SMED), TPM, Visual Management (VM) (Figure 1).
These methods and tools simplified maintenance processes and improve the maintenance performance.. The reduction of waste in maintenance means a reduce setup time and increase OEE [9,57,92], better management of consumable materials and spare parts [68], downtime reduction [36,85] and lower the Mean Time To Repair (MTTR) and standardization of maintenance procedures [29]. Barnard [12] pointed out that lean can help to develop Reliability Pro-gram Plan and to select only VA activities for execution. In the work [53] the authors sugest how LM principles can be adopted to LMn and underlined the importance of data in maintenance management process decision-making.
Evidence of LMn tools implementation is found in various sectors, such as the automotive industry [6,67], aerospace industry [21], power plants [29] textile industry [7,73], food industry [8], oil and gas industry [24,76] among others. Such evidence points to a number of universality and the use of the LMn tools in different contexts and companies, increasing its importance as an approach to continuous improvement [30,70,86]. However, implementation of LMn tools / practices is time consuming and costly process and needs continuous efforts to get effective results. Furthermore, there is no roadmap, no unified model and standard answer on the way to achieve lean [93].
Many researchers are identified industrial problems regarding LM implementation [1,59]. To support practitioners in effective implementation of LM methods and tools, various models suitable for different industries were developed [2,16]. For selection of lean tools in a manufacturing organisation [47] propose fuzzy FMEA, AHP and QFD-based approach, [61] use of AHP method and illustrate based on example related to the construction works, [42] proposes the improved VIKOR method and idea of multiple criteria decision-making for LM tool selection, [80] applies grey method for LM tool selection.
The above analyzes show that the choice of LM practices is not a simple problem. Moreover, the benefits of implementing LM may be different [86]. Since maintenance management in manufacturing companies connects various function (organizational and business) and activities, LMn methods tools implementation is complex and requires knowledge and skills. Maintenance managers, a specially in small in medium-sized enterprises, have a problem of selecting the best in a given operational context of the enterprise. Thus, development of decision-making support tools can assist in LMn tools performance appraisal, facilitating appropriate LMn practices [29].

Research methodology
The purpose of this research was to identify the main factors impacting on effectiveness of LMn implementation in manufacturing companies. To archive this goal the machine learning (ML) method and rough set theory (RST) was proposed.
The research methodology consist of two stages. The first stage presents the results of the study, conducted in the manufacturing companies, concerning the maintenance management and lean tools implementation. Then, the obtained data was preproceed and statistical analyses was performed (Section 3.1) In the second stage firstly the data set was divide into two sets: training and test data set. Then the decision trees (DT) (Section 3.2) and RST (Section 3.3) to generate the decision rules were used. The main goal of this stage was to generate the decision rules, which shows the relationships between the activities undertaken as part of the implementation of the lean maintenance concept and the results achieved. DT and RST were used for the variable of the number of unplanned downtime (NUD) indicator. Finally the obtained results were compared (Section 3.4). The detailed research methodology on Figure 2 is presented.

Data collection and preliminary analysis
In the first stage the data for the research in manufacturing companies were collected. For participation in this research the companies of various sizes and from various industries were invited. The research involved companies that had been implementing the LMn concept for at least 5 years such as SMED, TPM, 5S. For the research the survey method was used. The research involved mainly representatives of top and middle management as well as employees directly related to the supervision of the maintenance process in the company. An important element of the research was to obtain information about the types of benefits identified by enterprises after the implementation of LMn tools such as: TPM, 5S and SMED. The obtained data from the survey was adequately prepared. The first stage was their pre-processing, which included data selection and cleaning. The purpose of this step was to remove inconsistent or erroneous data. In the data preparation the processing technique by removing the missing data was used. This had the effect of reducing the size of the dataset After then, the statistical analyse for identification the factors which have the impact on the NUD value in surveyed companies was used. In Section 4.1 and 4.2 the results of the first stage of the research are presented.

Machine learning and decision trees
In the second stage firstly the pre-proceed data set was divided into two sets: training and test data set. The training data set for developing the classification models was used. Hoverer, the test set for them validation was used. Firstly the machine learning method (decision trees) for generating the decision rules (classification model) was used.
ML combines solutions from the fields of statistics, computer science, cognitive sciences, recognition theory and many other fields [14]. Developed in the nineties of the last century, data mining methods are one of the most widely used IT tools at the present time [33]. These methods are included in modern applications. Moreover, these methods are used by the middle and top management level to make decisions based on the knowledge "retrieved" from the internal documentation of the organization and the results of the conducted research. The use of machine learning methods is divides in three stages: data preparation, data analysis (model building) and implementation. ML methods were successfully implemented in many different areas [14,65] also in maintenance management [43,46,74,87,90].
One of the ML methods used for constructing the models are DT. DT are the one of the most popular and effective methods of ML [13]. DT are built mostly recursively (top-down approach) [34,71].
DT construction is performing by in-depth search of all available variables and all possible splits in the data set for each decision node (t) by choosing the optimal partition [48]. can be analyzed object classification [3].
In this study the Classification and Regression Trees (CART) algorithm was used. This algorithm is one of the basic algorithms proposed by [18]. The Gini index, also called as the impurity measure, has been proposed by the authors of the algorithm. The entire space For the node m, 1 ≤ ≤ m q , representing region R m , the Gini index is determined as follows (1) [3]: where p mi is a conditional probability for j − th class in a node, s -a number of classes. In node m with n m observations the conditional probability for j − th class is equal (2): The decision rule generated by CART algorithm were used to develop an expert system (with the use of PC-Shell /Aitech Sphinx).
In the system, for creation the knowledge base two blocks: faset and rules were used. For declaration the values and attribues of decision the fasets block was used. In the decision nodes the explanatory variables as decision attributes were placed. The target attribute represented the results of system inference. The NUD value was finally obtained in a separate output window. For validation the developed decision rules in the expert system the data from companies (test set data) was used. Then, the confusion matrix and k-fold cross-validation to assess the quality of developed DT was used. In the confusion matrix the following values were determined: TP (True Positive), TN (True Negative), FP (False Positive) and FN (False Negative). To assess the quality of the developed classifier the indicators proposed by [31,62,83] were used (Table 1).
In Section 4.3 the results of the this step of the research are presented.

Rough Set Theory
For developing the second classification model the RST was used. This theory is recognized as a tool that allows to reduce the input dimension and finds a way to reduce the uncertainty and ambiguity of data. Recently, there has been a very rapid development in this area and the possibilities and application of this theory in ML and decision-making systems. [50,64,81]. The main advantage of this theory is the ability to find the relationship between the explanatory variables and the dependent variables, which allows to support the decision-making process based on data analysis. Moreover, RST allows for dimensionality reduction (elimination of explanatory variables that have no influence on the explained variables). Knowledge extracted using RST is generated in the form of decision rules [50].
The formal description of the rough set theory in the works [63,64] is presented. In order to start data analysis using this theory, the concept of an information system and a decision table should be defined. Let S be a decision system define as , , , . S U AV f = Where U is a non-empty, finite , . An information system (IS) is called a decision table DT when there are separate sets of conditional C attributes and decision attributes D such as: C D A ∪ = and C D ∩ =∅. Then the decision table DT is described as follows: DT U C D V f = , , , , . Using the properties of RST allows for extending the possibilities of such a table, which leads to a significant simplification of the rules. Consequently, the decision-making system takes on the features of generalization and constitutes an effective and intelligent data processing tool. RST proposes to replace an imprecise concept with a pair of precise concepts, called the lower and upper approximation of this concept [69]. The difference between the upper and lower approximations is precisely the boundary area to which all cases belong that cannot be correctly classified on the basis of current knowledge. If IS = <U, A, V, f > is an IS such that B ⊂ A and X ⊂ U are: B * -the lower approximation of the set X in the IS, is the set: The lower approximation of the concept is therefore the area that defines all the objects that there is no doubt that they represent the concept in the light of the possessed knowledge. The upper approximation includes objects that cannot be ruled out that they represent this concept [20]. The edges are all those objects for which it is not known whether or not they represent a given set. There is also the so-called a numerical characteristic of the approximation of a set, which, using the coefficient of accuracy of the approximation (approximation), allows us to quantitatively characterize the blurriness of concepts [44].
In this study the RST allowed to generate a set of decision rules that can be used to construct decision systems. They are usually created in four iterative steps: identification of possible sets of values, isolation of conditional attributes (premises) and decision attributes, creation of decision rules in the form of IF -THEN, implementation in the decision system.
As in the case of DT the developed decision rules were implemented in the expert system. Moreover, the data test set to validate the decision rules and to assess the quality of the classifier the same indicators were used. The results of this step of the research in Section 4.4 are presented.

Comparison of the results
In the last step of the research the comparison of the results obtained form the assessment of developed classification models by DT and RST was performed. In the comparison the value of the indicators for DT and RST (Table 1) was analayzed. The analyses for the most frequently occurring classes was performed. In Section 4.5 the results of this step of the research are presented.

The structure of the surveyed companies
The research was carried out in manufacturing companies in Podkarpackie Voivodship (Poland). The companies participating in the study used various methods and tools of LMn. Figure 3 shows the percentage of surveyed companies implemented various tools of LMn.
The research was carried out in manufacturing companies in Podkarpackie Voivodship (Poland). The companies participating in the study used various methods and tools of LMn. Figure 3 shows the percentage of surveyed companies implemented various tools of LMn.

Fig. 3. Structure of the companies -LMn implementation
The surveyed companies were classified, inter alia, according to the following criteria: size of the organization, type of production, type of industry, and maintenance strategy. In the research the biggest group were large companies (70.77%) and companies from aviation industry (41.54%) and also companies with large batch production (25.68%) (Fig. 4, 5 and 6).

Fig. 4. Structure of the companies -the size of the company
In the analysed companies dominated preventive maintenance (PM) strategy, in particular: maintenance scheduled inspections (PM), maintenance scheduled inspections and repairs (PM) and autonomous maintenance (AM) (Fig. 7).
The implementation of the TPM system in the production plant significantly facilitates the process of supervising machines and technological devices. The main benefit of implementing TPM is the awareness of employees who, in conflicts and accompanying problems, find opportunities for continuous improvement. The decisive role in  Many of the surveyed companies emphasized that the main effect is to reduce the number of unplanned downtime (UD). Any sudden shutdown of a machine from the production process was called an unplanned downtime. The most common reason for such a downtime is a mechanical, electrical or electronic failure, which poses a risk to safety at the workplace and failure to maintain proper operating parameters. To assess the effectiveness of the implementation of the LMn concept, enterprises used mainly OEE indicator and the number of unplanned downtimes (NUD). The research results concerning OEE are presented in the work [4]. This paper presents the results of the impact of LMn concept implementation on reducing the number of unplanned downtimes (NUD). Figures 8 and 9 show the effects of implementing the LMn system -decreasing of NUD, in the surveyed companies. The analysis of this indicators was based on the following criteria: enterprise size and industry. When analyzing the results presented in Fig. 6, it should be noted that in the surveyed companies, the implementation of LMn most often resulted in a reduction of NUD in the range of 10-30% in the case of medium and large companies. The least, however, is above 50%. Small companies most often reported a reduction of NUD of less than 10%.

Fig. 8. The effects of implementing the LMn system (decreasing of NUD)size of the company
The companies from various industries most often indicated a reduction in unplanned downtime also in the range of 10-30%. In 7.15% of the aviation industry enterprises, NUD indicator is reduced by more than 50%. Table 2 presents the analyzed factors which have potential influence on NUD indicator and the p-value.

Statistical analyses
For the analyzed Hypotheses 9 and 12, there is a statistically difference in the value of the NUD indicator (p-value NUD = 0.001 and NUD = 0.000 -H0 rejected, H1 accepted). It means that there is a statistically justified difference in reducing the NUD from the factors studied. This proves that in the surveyed companies, decreasing the NUD depends on the implementation of the SMED method and from different types of supervision.
The presented analyzes allowed to identify the factors that have impact on the effectiveness of LMn. Moreover, the analyses showed the, which factors did not have the influence on the effectiveness of LMn. Despite the analyzed single factors, for example, such as: types of machines, Kanban, the way of supervision in the companies, it does not have a significant impact on the effectiveness of LMn, their interaction with other factors may already have a significant impact on the LMn effectiveness.
Therefore, in the next stage of the research, the concept of using ML method an RST to search for relationships between the identified factors, and thus their impact on the effectiveness of the LMn concept implementation, was proposed.

Decision trees in evaluation the effectiveness of Lean Maintenance implementation
Not all surveyed companies used the same LMn tools and methods, therefore CART decision trees were used for analysis. The main criterion for selecting this method was the possibility of its effective use for data sets that have numerous shortcomings in the independent variables. Moreover, this method is insensitive to the occurrence of atypical observations that may come from a different population. The CART classification tree for the dependent variable -reduction in the number of unplanned downtimes (NUD) was developed for the studied group of companies.
In the decision tree the training data set (from 65 companies) and the variables e.g. size of the companies, type of industry, type of production whose impact on the effectiveness of LMn implementation were analyzed (Table 2) as explanatory variables (predictors) were adopted. In addition, the following indicators were introduced: the TPM number of actions indicator (NTPMA), the number of preven-   The NTPMA indicator can take values on four levels from low to very high. The calculation of the MSI indicator value is assumed as: the sum of the activities value by the number of implemented activities (4).
Detailed information about these indicators are presented in the work [4,5].
While building the tree, the following assumptions were made: the costs of misclassifications were equal, the Gini measure as a measure of goodness, the discontinuation of the process of creating new nodes using trimming according to the variance (the stop rule) and the minimum frequency criterion in the split node, and a 10-fold cross validation as a quality measure. A developed tree consists of 15 divided nodes and 16 end nodes, which means that 16 decision rules may be defined. The developed decision tree is presented on Figure 10.
Selected decision rules were defined for the developed tree. These rules were defined for the end nodes that achieved the best results in reducing NUD using additional LMn methods and tools. Based on the decision tree, the chosen decision rules were defined: If the company's type of supervision expressed by the MSI 1.
indicator is different than 5.5, the 5S method is implemented in different areas, it is not a representative of the metal processing industry, it is not a small enterprise and implements a different type of production than small batch production (MS), it achieves a reduction in the NUD in the range from 10 to 30%. If in the enterprise the supervision method expressed by the 2.
MSI indicator is different than 5.5, the 5S method is implemented in various areas, it is not a representative of the metal processing industry, the supervision method expressed by the MSI indicator is not equal to 5 or 4, mainly has numerical ma-chines or referred to as "other" machines achieve a reduction in the NUD indicator in the range of 10 to 30%. If in the enterprise the supervision method expressed by the 3.
MSI indicator is different than 5.5, the 5S method is implemented in various areas, it is not a representative of the metal processing industry, the supervision method expressed by the MSI indicator is not equal to 5 or 4, mostly it has conventional machines and an average repair time of over 24 hours achieve a reduction in NUD by more than 50%.
In order to evaluate the quality of the developed classification model (DT), the validation for the test data was performed.
The obtained decision rules were used to develop the expert system. For validation the developed decision rules in the expert system the data from 25 companies was used. Among the analyzed companies, the major group were large companies (70%) mainly from the aviation industry (40%). Large batch production dominated (35%) in these companies. Then, using the obtained results the classification quality of the developed decision rules were tested.
The purpose of the qualitative analysis was to generate confusion matrices for the most frequently occurring classes. When developing the confusion matrix, the analyzed class was considered as positive, while other classes were considered as negative. Tables 3 and 4 present confusion matrices for the classifier -the value of NUD for the two the most frequently occurring classes: 10-30% and 30-50%.   The indicators from Table 2 have been used to assess the quality of the classifier (Table 5).
For easier analysis the results presented in the Table 5, the indicators into two groups were divided. The first (marked in red) contains indicators, of which the value should be as small as possible -in the case of the classifier without errors, the result will be 0. The second of them (other indicators) contains indicators, of which the expected value should be as high as possible = 1. The results presented in the table indicate that the NUD classifier in the 30-50% class is more likely to assign objects to the class to which in fact belong (Acc = 1). For the 10-30% class, the Acc is 0.96, which means that Err = 0.04.
The main goal of the validation was to confirm, that the developed decision rules actually lead to the planned results. The obtained values of calculated indicators confirmed the high usefulness of the classifiers.

Theory of Rough Sets in Lean Maintenance implementation assessment
In this stage the RST for the described variable NUD was used. In the analyses the same training data set as input (data from 65 companies and explanatory variables (predictors)) were adopted. The following algorithms were used to generate the decisions rules: exhaustive algorithm (ExhAlg), coverage algorithm (CovAlg), genetic algorithm (GenAlg) and LEM2 algorithm. The scheme for the explained variable "reduction in the NUD" is presented on Figure 11.
In Table 6 number of decision rules generated by each algorithm are presented.
The rules generated by each algorithm were used to classify the NUD indicator. The classification of objects (companies) from the appropriate decision tables was performed. The standard voting method was used for classification. The results of the classification for each of the algorithms in the form of a confusion matrix is presented. The rows of the matrix show the values for the actual decision classes (the values of the dependent variable).
On the other hand, in the columns of the matrix the results of prediction are presented. Additionally, the matrix contains the information about the number of objects belonging to a given decision class, accuracy and coverage. Moreover, a true positive rate is presented.
In the Table 7 the results of classification for GenAlg, ExHAlg and LEM2 are presented. In the case of the explained variable NUD, the confusion matrices were the same for these algorithms. All 65 objects in the decision table were correctly classified (Total Acc = 1).
In the Table 8 and 9 the results of the classification for CovAlg with different value of coverage parameter are presented.
In the case of rules created by the coverage algorithm it was different. When assuming a small value of the coverage equal to 0.001 or less, the algorithm generates rules that give the maximum coverage calculated for all decision classes jointly. It is approximately 0.977 (Table 8). However, with this value of the coverage factor, the classification accuracy is not maximum -it amounts to 0.95. It is caused by an incorrect classification of three objects which have been assigned to the class > 50%. In fact, these objects belong to the decision class of 10-30%. To increase the accuracy of the classification the value of the coverage should be increased. Already for the coverage value equal to 0.12, the accuracy is 1, which means no classification errors (Table  9). However, the coverage is less than that generated previously, and is approximately 0.895. This is due to the lack of classification of two objects from classes <10%, two objects from the class 10 -30% and one object from the class 30 -50%.
As in the case of decision trees, the developed decision rules were implemented in the expert system. Again, the data from 25 companies to validate the decision rules was used. To assess the quality of the classifiers the confusion matrices were developed. These confusion matrices by comparison of the results from the studied companies with the result from the expert system were performed. In the Table  10 the results of NUD classification for the LEM2 algorithm are presented. Total Accuracy for this algorithm is 0.958.
In the Table 11 the results of the classification for CovAlg are presented.
Total Accuracy for this algorithm is 0.940, which means that the ability of this classifier is lower than in the case of LEM2 algorithm. In the Table 12 the results of the classification for ExhAlg are presented. Total Accuracy of this classifier is 0.980.
The best results for GenAlg algorithm were obtained. All 25 objects in the decision table were correctly classified (Total Accurancy = 1).

Results comparison
In the Table 13, the comparison of the results for the most frequently occurring classes: 10-30% 30-50% is presented. The comparison presents the indicators values for the models generated using DT and RST.
Results for the genetic algorithm are not included in the Table 13, because the results are the same as for exhaustive algorithm in the marked class of 10-30%. Considering the 10-30% class, the Accuracy ratio shows that the genetic algorithm and the exhaustive algorithm are most likely to assign objects to the class to which they actually belong. Only a slightly worse Accuracy result was obtained for the  other two RST algorithms and for DT. The Accuracy results for the 30-50% class are different. The maximum value was obtained for the LEM2 algorithm, the genetic algorithm and for DT. The lowest value was recorded for the coverage algorithm. Similar conclusions can be drawn by looking at the general classifier error (Err) (keeping in mind that the lower the value of the Err, the better the classifier). This shows that the ability to predict of the models created varies depending on the NUD indicator, and the results contained in discussed table may be valuable for future users of the developed models. The differences in the results can also be seen in cases of sensitivity (TPR), which shows the ability to recognize objects belonging to the distinguished class. For the 30-50% class, the TPR indicator obtained the maximum value for all models except for the classifier generated with the coverage algorithm. However, in the case of the 10-30% class, the LEM2 and DT algorithm did not reach the value of 1. The results of the TPR index are very similar to the NPV, which indicates the probability that an object assigned to the unmarked class by the classifier actually belongs to this class.
One of the best results was obtained for the TNR index, which indicates the ability to correctly classify objects not belonging to the marked class. Comparing the TNR and TPR values for the LEM2 algorithm and DT in the 10-30% class, it can be seen that these classifiers better recognize objects not belonging to this class. A similar situation occurs for the coverage algorithm in the 30-50% class. The values of the Precision index (PPV) were almost identical to those in the TNR.
In the case of the last three indicators from Table 12 (Matthew's correlation coefficient, F1-score, and Youden's J statistic), the results calculated for each of them are similar. All three indicators show that the best classifiers for the marked class 10-30% are classifiers built on the basis of the exhaustive algorithm and the genetic algorithm. However, for the class 30-50%, the best classifiers come from the LEM2 algorithm, the genetic algorithm and DT.
The probability of omitting marked objects by assigning them to an unmarked class is called FNR. This indicator is the lowest in the case of the exhaustive, coverage and genetic algorithms in the 10-30% class. However, in the 30-50% class, all classifiers have the lowest possible FNR value, except for the classifier built on the basis of the coverage algorithm. On the other hand, the FPR and FDR indicators, which refer to the probability of so-called false alarms generated by the classifier, show that the mentioned probability is equal to zero for all classifiers except CovAlg in the 10-30% class, as well as ExhAlg and CovAlg in the 30-class 50%.

Conclusions
Many companies use LM mainly to eliminate production losses. These companies not only increase their productivity, but also strengthen their position on the market. It turns out that companies have started to recognize the importance of maintenance, so they have started implementing LMn.
In this paper the problem of LMn implementation assessment was analyzed. Firstly the data from the manufacturing companies were collected and preliminary analyzed. The chisquare test for identification the factor affecting for LMn were used.
Then, the machine learning method to developed the classification models was proposed. These models by using DT (CART) and RST (four different algorithms: LEM2, Exh.Alg. Cov.Alg and GenAlg). were developed. To develop these models, data obtained from companies, that implemented LMn were used. In the first stage of the survey, information from companies was collected on: used maintenance strategies, implemented LMn methods and tools, and the results of the implementation. To assess the benefits of the LMn implementation the indicator NUD was analyzed.
The obtained results indicate, that both for the classifiers obtained, RST and DT have a high prediction ability. However, the accuracy of the prediction depends from the analyzed class. The predictive model generated by DT show the better prediction ability in the analyzed class 30-50%. However, the situation in RST is slightly different. The same high prediction ability was demonstrated by the model generated with the use of the genetic algorithm. For the two most frequently occurring classes, this model has the same high predictive ability. However, better accuracy for the class of 30-50% were achieved for RST for LEM2 algorithm. It should be noted that this algorithm generates the smallest number of decision rules. This shows that a large number of decision rules is not required to obtain good ability of prediction models. For the 10-30% class, the best prediction ability was obtained for the model with the use of the coverage algorithm. The worst prediction ability for the most frequently occurring classes was achieved by models generated with the use of the coverage algorithm.
The created models have some limitations. First of all, these models were developed only based on a small group of companies in  the specific region. Secondly, despite the fact that companies of various sizes and from various industries were invited to participate in the research, large enterprises from the aviation industry were the largest group. As a result, the developed models are based primarily on the experience and effective implementations of LMn by these companies. Therefore, it may be a potential limitation of the implementation of these models in practice. Finally, a high level detailing has been taken to develop the model using DT. This can over-fit the model to the data. Thus, it is planned to continue relevant research in the future to eliminate the limitations of the developed models.
Although the conducted research has some limitations, the presented results can be used by all manufacturing companies to predict and assess the effectiveness of the implementation of LMn methods and tools. In addition, the research results can be used by comapnies and scientists for the effective organization of maintenance, selection of an appropriate maintenance strategy, but above all for improvement of already implemented activities in this area.