A Semi-Automatic Methodology for Making FMEA Surveys

This paper proposes a semi-automatic methodology to assist the user in creating surveys about FMEA and Risk Analysis, based on a customized use of the tools for semantic analysis and in particular a home-developed syntactic parser called Kompat Cognitive. The core of this work has been the analysis of the specific FMEA-related jargon and its common modalities of description within scientific papers and patents in order to systematize the linguistic analysis of the reference documents within the proposed step-divided procedure. The main goals of the methodology are to assist not skilled in the art users about FMEA during the analysis of generic and specific features, by considering large moles of contributions in restricted amounts of time. The methodology has then been tested on the same pool of 286 documents, divided between 177 and 109 patents, manually analyzed in our previous survey, in order to replicate part of its classifications through the proposed new modality. In this way we evaluated the abilities of the methodology both to automatically suggesting the main features of interest and to classify the documents according to them. KeywordsFMEA, Risk analysis, Patents, Parsing, Semantic


Introduction
Since its introduction in 1949, FMEA had a great following both in industrial and scientific community as testified by the vast multitude of related documents from scientific and patent literature: to date, we have more than 3600 papers only in Scopus DB and 146 patents in Espacenet DB, just looking for one single keyword "FMEA", without synonyms, with a trend of constant growth over the years. The great majority of these contributions deals with FMEA modifications involving the procedure and the integrations with new methods and tools in order to enlarge the field of application and to improve the efficiency of the analysis, e.g. by reducing the required time, by finding more results, etc.
In order to be able to orientate among the many contributions the surveys proposed in literature can play a fundamental role, even if they are limited in the number of considered documents by never exciding one hundred of only scientific papers from journals. Bouti and Kadi (1994) analyzed within scientific papers about FMEA, the description and review of its basic principles, the types, the improvements, the computer automation codes, the combination with other techniques and specific applications. Sutrisno and Lee (2011) analyzed, through a literature survey, the FMEA applications for enhancing service reliability, by determining how FMEA is focused on profit and supply chain-oriented service business practices and FMEA research opportunities are related to enhancement in Risk Priority Number (RPN), reprioritization, versatility of its application in service supply chain framework and non-profit service sector as well as combination with other quality control tools are proposed for further investigations. Tixier et al. (2002) studied 62 methodologies about Risk Analysis by separating them into three different phases (identification, evaluation and hierarchisation) and by studying their inputs (plan or diagram, process and reaction, products, probability and frequency, policy, environment, text, and historical knowledge), the implemented techniques for analyze risk (qualitative, quantitative, deterministic and probabilistic) and their output (management, list, probabilistic and hierarchisation). Liu et al. (2013) analyzed the innovative proposed approaches to overcome the limitations of the conventional RPN method within 75 FMEA papers published between 1992 and 2012 by identifying which shortcomings attract the most attention, which approaches are the most popular and the inadequacy of the approaches. Other authors focused on analyzing specific kinds of application of FMEA approach. Dale and Shaw (1990) studied how 78 companies of the United Kingdom motor industry apply FMEA by identifying some common difficulties such as time constraints, poor organizational understanding of the importance of FMEA, inadequate training and lack of management commitment.
An attempt to overcome this limitation has been made by our previous surveys (Spreafico et al., 2017;Spreafico and Russo, 2019a), where we analyzed a representative pool of scientific papers (220) and patents (109), by classifying them into four groups of common improvements dealing with the applicability of the method, representation of the cause and effect chain, risk analysis and integration with the problem-solving phase. Each group has a series of subclasses about the subgoals and the integration (methods and tools).
A common limitation for all these surveys regards their reference time period and the onerousness of execution and updating. To deal with such problems, automatic tools and techniques for text mining can be considered, which can help for different purposes: knowledge source both for papers (e.g. Google Scholar, Scopus) and for patents (e.g. Espacenet), bibliographic search tools implementing different techniques (e.g. Boolean or Semantic logics), tools for managing the documents (e.g. Mendeley), tools for text summarization and topics extraction, software for data representation (e.g. D3.js, Google Charts). However, an automatic methodology to assist the researcher to organize and manage documents about FMEA and Risk Analysis is still missing in literature.

The Procedure of Analysis
In order to overcome the open problems from the previous survey, we retain that a semi-automatic methodology based on semantic analysis could be a possibility as recently demonstrated for other fields of application, e.g. Sentiment analysis social media monitoring, E-discovery for legal literature and GoPubMed for biomedical texts.
In our case, starting from the same pool of documents from our previous survey (Spreafico et al., 2017), we analyzed them through a home-built software, called Kompat Cognitive (Russo et al., 2018), and we compared the linguistic rules for the linguistic analysis (e.g. logical analysis), with the specific FMEA terminology. Kompat Cognitive is an advanced version of the our previous Kompat, a syntactic Parser that allows to easily set up a sequence of terms and to automatically extract the list of terms linked to it by selecting the semantic relations. This tool has previously been used also in other fields, such as circular economy  and the investigation of the market potential of a product (Russo et al., 2019).
Through this work we collected some common linguistic forms used to describe FMEA improvements within papers and patents at different levels of detail: pursued goals, specific strategies of intervention and proposed integrations with methods and tools. We have then classified the results within a series of specific steps in order to define a systematic methodology to be automatized.
In the following, the proposed methodology is presented in detail.

STEP 1 -Building the Electronic Pool of Documents
The first step of the proposed methodology regards the digitalization of the documents, where the required output files are related to the used tools, both as regard the format, the syntactic and the organization of the content within specific text tags. In our case, we opted for the definition of a single textual file (i.e. XML) for each document, which must to be named with a univocal ID name, in order to allow the user to go trace back the document containing a determined content. The content of the electronic document is instead organized through text fields, including specific parts of the documents (e.g. introduction, state of the art and proposal), so as to allow the analysis of some parts. In this way during the enumerations of the specific proposed contributions we can avoid redundancies present within the state of the art of the considered documents. A limitation of these files concerns the impossibility to process images, tables and graphs but only the semantic relations within the sentences.

STEP 2 -Query Definition
Once the electronic pool has been defined it has to upload on the server and can be processed by Parser tool, in response to a query, consisting of a single keyword (e.g. name, verb, adjective) defined by the user, provides the main linguistic relations between the keyword and other terms (e.g. subjects, verbs, objects). These relations are identified within the single sentences of the processed documents and they also include the relations with modifiers of the keyword (e.g. synonyms, meronyms, hypernyms) that are identified on the basis of statistical patterns based on the considered pool. The kinds of identified relations by the Parser are different depending on the linguistic nature of the used keyword.
If we use a substantive (e.g. FMEA), the software provides: (1) the modifiers, or the adjectives or the substantives acting as adjectives (e.g. Design FMEA, Economic-Based FMEA), (2) the nouns and verbs modified by the keyword (e.g. FMEA template), (3) verbs using the keyword used as object (e.g. perform FMEA), (4) verbs with the keyword used as subject (e.g. FMEA is …, FMEA generates …), (5) substantives linked to the keyword through AND/OR relations (e.g. FMEA and TRIZ -Spreafico and Russo, 2019b) and (6) prepositional phrases (e.g. … of FMEA, … through FMEA). While, when we use a verb as keyword, we can instead identify: (1) the modifiers (e.g. effectively improve), (2) the objects (e.g. … improve quality, … improve design), (3) the subjects (e.g. TRIZ improves …) and (4) other particles (i.e. prepositions and adverbs) used before or after the verb (e.g. determine AND select). Thus, each provided linguistic relation can be manually checked by the user: by selecting it, the tool provides the list of the sentences of the documents of the pool, which contain it. In this way, we can verify their adherence in relation to the purposes and the context of the research. However, although the qualitative level (precision) of the analysis along with the quantity of the provided results (recall) achieved through the Parser proved to be more than acceptable in several applications from different fields, some considerations about this specific case, and in particular Table 1 summarizes the main terms that can be used as keywords to start the analysis, which have been enriched with synonyms and other terms that iteratively emerged during the analysis as explained in the following (Step 3). Then, we searched for the common linguistic constructions, involving prepositions (e.g. by, for), more articulated constructs (e.g. with the aim to) and the identified terms (verbs and nouns), and we analyzed their ways of use within the documents (e.g. introduce a tool, a goal, a strategy or relate these terms among them). As result, some common forms have been identified.
The construct "In order to" (see Figure 1) is generally used to explain the reason why a determined integration, e.g. a method or a tool, has been introduced to improve FMEA: to achieve a generic goal (e.g. reduce the required time for the application of the method) or to pursue a strategy (e.g. Determine More Failures through schematic representation). We noted that both of them are expressed through a verb with infinite form and an object. In addition, "In order to" can be followed by the particles "for" or "by", respectively coupled with a name or a verb in "-ing" form to introduce another strategy or another goal. The constructs "With the aim of" and "With the goal of" (Figure 2) is also exploited for express the goals and the strategies related to a determined integration (methods and tools) and to relate a strategy with a goal or vice versa. Other constructs are instead related to the declaration of the goals (Figure 3) of the proposed FMEA modifications. They are the prepositions "Through" and "By", which are used after the declaration of the goal itself, expressed (through a verb and an object) to relate an integration or a strategy. Finally, the particle "For" has been used sometimes after the integration, preceded by the preposition "by", to introduce the related strategy, in turn expressed through a name.

STEP 3 -Semantic Expansion and Counting of the Number
The activity carried constitutes the basis both to automatically identify the main interesting features of interest and to classify the documents according to them. However, in order to obtain a significant result regarding the document classification, the set of the identified terms has to be expanded by considering also the synonyms and other terms. In fact, the same concept can be expressed within the documents in a multitude of different textual forms that generally increase if the investigated concept itself is more abstract. Table 2 shows, as example, some of the combinations of verbs and objects used to express the concept "Solve problem", found within the considered pool of documents. Carrying out a linguistic expansion manually is undoubtedly a difficult and onerous task and some considerations must be taken into account. For what concern the expansion of generic terms, i.e. verbs (e.g. Solve) and common names (e.g. Interface, Design), a simple dictionary can be sufficient to achieve acceptable results, while for those specific nouns and concepts related to FMEA (e.g. Root Causes) and their integrations (e.g. Quality Function Deployment), the knowledge about the argument is required. This because they are often referred also through their multiple acronyms: e.g. TRIZ methods can be reported as "Theory of Inventive Problem Solving", "TIPS", "Theory of the resolution of invention-related tasks". Moreover, there are also some variants of the methods that are typically used in different applications: table 3 reassumes some applications of Fuzzy logic, which have been identified by the Parser, by analysing the reference FMEA pool of documents. The main problem of the manual expansion of the synonyms of methods and tools is the excessive required amount of time, which typically increases when the analyst is less expert. In addition, if the process is carried out a priori can be useless, because the expansion of the synonyms, not knowing those effectively used within the pool, can digresses through extraneous terms. Fortunately, some of the available tools for semantic analysis, and the Parser, are able to automatically identify the modifiers of the used keywords within the considered pool, with a sufficient degree of accuracy. This functionality is also particularly useful to discriminate specific uses of the variants of the integrations in relation to the context of use.

Case Study
In order to test the efficacy of the proposed methodology, we replied the manual classification of the solutions (i.e. strategies) to improve FMEA, from our previous survey (Spreafico et al., 2017) by comparing the obtained results. That previous survey was made on a pool constituted by 286 documents, 177 scientific papers (165 from academia and 12 from industry) and 109 patents (23 from academia and 86 from industry), collected from scientific DBs (i.e. SCOPUS) and international patents DBs (i.e. Espacenet). Table 4 presents the considered classification about the main problems (Applicability, Cause and Effects Chain representation, Risk Analysis and Problem Solving) and the solutions to improve them identified from literature, with the total number of documents related to each of them. In order to make an objective evaluation, we applied the proposed methodology on the same pool of documents of our previous survey. So, we executed two different tests: TEST 1 aimed to investigate the ability of the proposed methodology to automatically identify the features of interest for the analysis (TEST 1) and TEST 2 to evaluate ability of the methodology in classify the documents according to the features.

TEST 1. Identifying the Features
During this first test, we used simple generic keywords consisting of common nouns (e.g. FMEA, Failures, Guidelines) and we considered only one semantic structure where the keywords are used as object in relation to the verbs automatically provided by the Parser. The test is considered positive if the Parser is able to suggest relations referred to the features (Problems) of the previous survey. As further confirmation, we manually checked the sentences of the documents provided by the Parser for each relation in order to verify their adherence with the relative feature. Table 5 collects, for each previously determined feature, the used keywords, the considered relations and the pertinent sentences. As first confirmation of the goodness of the methodology, from the table, we can see that for each feature, one pertinent sentence has been determined.

Failures
Identify failure modes "The proposed method uses behavior modeling to map control functions to physical entities and identifies failure modes" (Kmenta and Ishii, 1998)

New methods for Failure Causes
Failures Identify failure causes "It is a methodological tool which allows identifying and describing the failures scenarios for a given product or service, At the same time, ..., identifies the causes" (Laaroussi et al., 2007) Statistical methods Method Approach statistical methods "Failure prognostics has been approached via a variety of techniques ranging from probabilistic / statistical methods" (Abbas and Vachtsevanos, 2009)

Requirementsbased criteria
Requirements Identify requirements "The first starts by identifying functional requirements and continues with performing a safety analysis, such as a Fault Tree Analysis, in order to identify non-expected system behavior" (Guo and Liggesmeyer, 2013;U.S. Patent No. 13,181,681) Economic criteria Costs Analyze costs "exemplary graphical user display showing cost analyzed and prioritized failure modes using the data" (Conchieri et al., 2009;U.S. Patent No. 11,859,199) Historical data Data Are historical data "The information used can be historical data, theoretical analysis, expert opinions and the attitude of interested parties" (Petrovic et al.,

2014)
Qualitative criteria Analysis Perform qualitative analysis "performing a sequence based qualitative risk analysis to identify a plurality of safety critical" (Guo and Liggesmeyer, 2013;U.S. Patent No. 13,181,681)

Results representation
Failures Improve failures representation "The purpose of scenario-based FMEA is to improve the representation of failures" (Kmenta and Ishii, 2000) New methods Method Adopt TRIZ method "To solve contradiction in order to improve service quality, we can adopt TRIZ method" (Wirawan and Ayu, 2014) Use FMEA for other purposes FMEA Use FMEA for … "Using FMEA for early robustness analysis of Web-based systems" (Zhou et al., 2012)

TEST 2: Counting the Results
During the second test, we evaluate the ability of the methodology in classifying the documents according to the determined features. In order to check this functionality, we used specific keywords for each feature and we manually checked the sentences provided for the main linguistic relations between the keywords and their modifiers with other terms. For each sentence that has been considered pertinent with a certain feature, we traced back the corresponding document through its univocal ID. For instance, in order to identifying those documents dealing with the feature "Anticipate the analysis", we used the keyword "Anticipate", we analyzed the relations "Anticipate DURING …", "Anticipate AND/OR prevent …" etc., and we checked the identified sentences from the documents that, for the sake of brevity, in the table are reported with an associated number that can be found in the legend. Table 6 collects the achieved results: the features, with the number of related components from previous survey, the used keywords, the selected linguistic relations, the pertinent documents (references and total number) for each feature and the index of efficacy (expressed in percentage).  Arcidiacono and Campatelli (2004), Bertelli and Loureiro (2015), , Braglia et al. (2007), Bell et al. (1992) Table 6 continued …

Discussion of the Results
Both the tests achieved positive results by confirming the validity of the proposed methodology both in identifying the features and in classifying the documents according to them.
During the first test, all the features of the previous survey have been identified by using simple generic keywords not requiring a specific knowledge about FMEA. While, the results arising from the second test are encouraging even if with different levels of efficacy of the queries depending on the cases: for some of them (e.g. Historical data) the keyword "Historical" in relation to the word "Datum" has been sufficient to determine many more documents compared to the previous survey (+ 93%), while in other cases (e.g. Economic Criteria) we did not obtain the same success.
Only for specific searches, the knowledge about FMEA had been required: e.g. to collect all the documents dealing with FMEA anticipation, the network of relations generated from the more obvious keyword (i.e. "Anticipate") has not been sufficient, so we had also to use the keywords "Design" and "Manufacturing" which represent the two main phases during which FMEA can be anticipated. This strategy can be considered for searching the more abstract features underlying a wider interpretation and expressed through different textual forms; e.g. to identify the documents dealing with the integrations with "Statistical methods", the keyword "probability" proved to be much more useful than the more obvious "statistic". Consequently, the choice of the most suitable features of analysis can influence the efficacy of the proposed methodology: the "better ones" are able to easily suggest a great number of possible lexical declinations to be used as keywords to increase the recall of the results.

Conclusions
In this paper, a semi-automatic methodology to build FMEA surveys involving a home-built syntactic Parser has been introduced and the results of its test on a pool of 286 documents, divided between 177 and 109 patents, replying a previous manual work of analysis, have been reported. As result from the tests, the proposed methodology proved to be useful both to automatically determine the main features of interest to be analyzed within the pool and to classify the documents according to them, while saving a consistent amount of time.
Furthermore, the qualitative level achieved by the analysis (recall) can be in most cases higher than that manually obtained even if the knowledge about FMEA of the executioner can be crucial especially for checking possible misunderstanding related to the specific jargon. However, this fact seems do not preclude novices from using the methodology by reaching good results.
In particular, the use of the methodology to identify new features can increase the depth of the level of the analysis also in addition to those manually pre-determined for particular exigencies. While, the automatic classify the documents can be instead a useful tool especially for not expert users, since the major part of the results (> 60%) has been achieved by using generic and easy conceivable keywords, even if a supervised selection of homogenous and alternative features for the analysis can increase the precision of the analysis.

Conflict of Interest
The authors confirm that there is no conflict of interest to declare for this publication.