Testing a New Structured Tool for Supporting Requirements’ Formulation and Decomposition

: The deﬁnition of a comprehensive initial set of engineering requirements is crucial to an e ﬀ ective and successful design process. To support engineering designers in this non-trivial task, well-acknowledged requirement checklists are available in literature, but their actual support is arguable. Indeed, engineering design tasks involve multifunctional systems, characterized by a complex map of requirements a ﬀ ecting di ﬀ erent functions. Aiming at improving the support provided by common checklists, this paper proposes a structured tool capable of allocating di ﬀ erent requirements to speciﬁc functions


Introduction
Identifying and formalizing engineering requirements is deemed critical to achieve effective and efficient product development processes. To this end, requirements are typically presented as the technical description of the objectives that characterize the design process since the very beginning [1][2][3][4][5][6]. Requirements guide the designer through the exploration of the design space, playing an active role across the design activities of analysis, synthesis, and evaluation [7,8]. In this context, requirement checklists are simple tools that support designers in building the technical specification [3,5,9].
A comprehensive review of the literature about requirements' checklists is out of the scope of this work, but it is possible to state that the management of requirements is a quite specific topic, as there are handbooks specifically tailored to support scholars and practitioners in activities such as requirements refinement, analysis, etc. [10]. Likewise, the selection of elicitation techniques/approaches is also attracting more and more attention in recent years, e.g., [11]. However, the literature presents most of the contributions about requirements, their elicitation, and management from the perspective of the Information Technology domain.

•
Evidence of rigorous and complete design specifications has been not provided by the few experiences related to checklists' verifications, especially for those suitable for systematic design processes swiveling on the German-school approach. Here, the handling of requirements is not limited to the formal fulfilment of previously expressed needs. Indeed, in these contexts, it is also important to follow acknowledged scientific results and related indications about how to avoid premature fixations, which can hinder creativity of design outcomes starting from the formulation of the design task [29].

•
Although different kinds of approaches and/or guidelines can be found in literature (e.g., about how to write requirements [30] or how to identify stakeholders [31]), design checklists constitute a distinct group of tools aimed at stimulating the generation of new requirements. However, available checklists fundamentally rely on designer's ability rather than being guided by an appropriate instrument to formulate the requirements for each function of the system properly.
In this respect, it is worth noting that omitting requirement formulation for some main functions surely leads to additional iterations between the conceptual design and the task clarification phases.

•
The formulation of requirements is currently lacking a benchmark and, although metrics have been established, a way to assess them is not agreed and shared in the domain.
This highlights a literature a gap and justifies the need to introduce and test a system to guide designers systematically into the formulation of requirements with the objective of maximizing the quality of a design specification, while focusing on the main functions expected for the system to be designed. Indeed, it is widely acknowledged in the literature that abstraction is a key element to foster creativity by challenging design fixation [26,32]. Accordingly, functions are acknowledged to enable system representations characterized by a high abstraction level in engineering design contexts [33][34][35][36][37]. Therefore, it is assumed here that a hypothetical new tool for supporting requirement formulation should take into account at least the main functions of the technical system to be designed. Unfortunately, notwithstanding the presence of many approaches and tools from the requirement engineering literature and/or software engineering, an instrument is still lacking that can be directly applied (in the context of physical products) by exploiting the advantages offered by both acknowledged checklists and abstraction processes.
In the attempt to consider the mentioned gaps contextually, the paper presents an experimental verification of the impact of a proposed new tool for formulating requirements, by comparing its outcomes with those gathered through a conventional checklist. Moreover, to obtain information about the perceived task load when using the tool, a structured survey has been performed with subjects involved in the experiment. Such information is deemed useful to get a full overview of the pros and cons of the considered tool, which span measured performances and users' perception.
The paper is structured as follows. Section 2 reports a detailed description of the checklist considered in this study, together with a description of the proposed tool and the investigation method. Section 3 shows the obtained results, while Section 4 reports a comprehensive discussion about the achieved results, the limitations of the work, and both the expected impact and future developments. Eventually, conclusions are drawn in the last section.

The Proposed Tool: A Structured Matrix for Requirements' Formulation
As reported in Section 1, the proposed tool allows designers to consider the main functions of the products contextually and to allocate different requirements' categories to each function. The initial set of functions to be processed can be extracted by means of existing checklists. Indeed, the tool can implement any checklists among those available in engineering design literature and/or specifically proposed by engineering designers, e.g., for peculiar application fields [38]. However, as mentioned, the tool is thought of as particular support when integrated into systematic procedures for the design Appl. Sci. 2020, 10, 3259 4 of 22 of physical products. Consequently, its validity in other contexts, e.g., information technology and/or software engineering, is not ensured.
Moreover, the proposed tool allows its users to discern between wishes and demands. Indeed, although apparently similar, these two types of information are regarded as very different indications by designers [3]. The former are targets that guide the designer in the design space exploration but do not provide a specific value (e.g., "maximize speed", or "minimize power consumption"). Differently, demands provide precise values to be reached, e.g., "the speed must be greater than 10 m/s", or "the power consumption must be lower than 20 kW".
As shown in Figure 1, the proposed tool appears as a matrix, which can be implemented in an electronic spreadsheet, thus allowing easy updates, sharing, and modifications. More precisely, each couple of rows represents a different entry from a reference checklist, further subdivided into design wishes and demands. This aims to make the tool extremely versatile, so that requirements' checklists using different types of triggers can be used alternatively or in a combined fashion. Differently, the groups of columns represent the main functions of the product.  No specific concept of function is considered here, and the designer can use the most suitable one for their specific understanding, purpose and/or context. Accordingly, the definition of the main functions, i.e., what the product should do irrespective of specific solutions, is the very first step to be taken before the formulation of demands and wishes. After the identification of the main functions, the designer can formulate and then allocate each requirement by specifying: • Which kind of item is affected by the requirement (e.g., geometry, kinematic), by simply organizing the requirement into the desired couple of rows.

•
Whether the requirement is a wish or a demand, simply by putting the requirement in the desired row. • Which function is affected by the requirement, simply by locating it in the desired group of columns.
It is important to recall that, as claimed in the Introduction, the proposed tool does not provide any support for the formulation of functions, but simply keeps track of those formulated by the designer. The structured matrix is in fact expected to support designers in formulating the design specification through the formulation of requirements, and/or the decomposition of initial needs into requirements, which is commonplace in (complex) systems engineering. In this respect, it is important to highlight that the proposed tool focuses on the fuzzy front end of the design process, differently from other approaches developed within the requirements' engineering discipline (e.g., [39][40][41]). Some of these approaches aim at providing structured tools to better manage the evolution of requirements along the Appl. Sci. 2020, 10, 3259 5 of 22 different phases of the design process (e.g., [39]), or to model the decomposition in a function-based hierarchical fashion (e.g., [40]). Differently, the proposed tool considers functions only to better focus the attention on the most important features of the system to be designed, while striving to ensure an adequate abstraction in order to avoid premature fixations [42]. In other words, it is possible to state that the proposed tool supports a first decomposition of requirements, but only for a first and relatively high level of abstraction (i.e., that of the function-based conceptual design from the German approach). Indeed, by identifying the main functions and associating groups of columns to them, it is possible to visualize how many requirements are ascribable to each function. Similarly, by associating the checklist entries to the rows of the matrix, it is also possible to visualize whether a specific item is sufficiently populated by requirements. Additionally, due to the fuzziness of this specific phase of the engineering design process, it is possible that additional functions could be identified during the formulation of requirements. The tool allows designers to add these functions and to formulate additional requirements, as well to re-allocate already formulated ones; hence, possible iterations between functions definition and requirements' formulation are supported, especially if the former has not been accurate.

The Checklist of Pahl and Beitz
For the scope of the work described in this paper and its comparison with a benchmark, the authors considered the checklist by Pahl and Beitz for conceptual design (hereinafter PBCL). The recalled checklist guides the exploration of requirements through the administration of a set of stimuli to the user. The stimuli cover different categories of product features, which can be briefly summarized as follows: To give a hint of the formulation of stimuli belonging to PBCL, two examples are presented in the following for the category "specific features of the system": • Geometry: Size, height, breadth, length, diameter, space requirement, number, arrangement, connection, extension • Kinematics: Type of motion, direction of motion, velocity, acceleration.

Experimental Set Up
The experimental set-up can be schematically represented as in Figure 2, where the required task is to translate initial product information into a comprehensive set of engineering requirements.
More specifically, due to the impossibility to simulate a real situation where the designer has the opportunity to interact actively with the stakeholders, a set of initial needs has been formulated for experimental purposes. For these reasons, the effects of the tool in a real design process and in terms of design creativity cannot be examined. Indeed, the actual scope of this work is limited to the identification of both pros and cons of the tool through the considered set of metrics.
Notwithstanding the limited design activity that is under consideration, the experimental conditions can still be deemed sufficiently realistic. Indeed, firms often investigate what is expected from a new product before starting the design process. Otherwise said, for the scopes of the present study, requirements elicitation precedes and provides inputs to the design tasks under consideration. Therefore, the study matches those circumstances in which people/teams in charge of the definition of needs to be satisfied and the formulation of requirements differ. As a result, what is tested through the present experiment mirrors this kind of condition, i.e., the designer works on a list of functions provided by someone else and has to build an accurate design specification to allow the design process to move forward. In the practice, this list can be incomplete and/or expressed in non-technical terms, e.g., because it has been developed from commercial investigations, which remarks the criticality of the requirements' formulation (or decomposition in other circumstances) tasks simulated here. In addition, in this case, the tool is expected to provide additional support if compared to a simple checklist because it makes it possible to identify potentially missing functions-However, this capability is not tested in the experiment, whose reach is limited to the transformation of any entries into design requirements.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 22 provided by someone else and has to build an accurate design specification to allow the design process to move forward. In the practice, this list can be incomplete and/or expressed in non-technical terms, e.g., because it has been developed from commercial investigations, which remarks the criticality of the requirements' formulation (or decomposition in other circumstances) tasks simulated here. In addition, in this case, the tool is expected to provide additional support if compared to a simple checklist because it makes it possible to identify potentially missing functions-However, this capability is not tested in the experiment, whose reach is limited to the transformation of any entries into design requirements. A detailed description of the experimental set up is reported in the following paragraphs, while details about data analysis are provided in Section 2.4.

Phase 1
Thirty-six students of the Master of Science (MS) degree in Mechanical Engineering at the University of Florence, Italy, attending the module "Product Development and Engineering Design", constituted the sample of convenience for the present experiment. With reference to the scopes of the experiment, they are considered as belonging to the same population. In particular, at the time of the experiment, the subjects had never attended any lecture in requirement engineering or requirement formulation in general. Moreover, the module they were attending was the only one focused on engineering design and product development in their whole study plan. Accordingly, their previous experience with the handling of requirements can be considered negligible and largely similar. For that reason, a short briefing (thirty minutes overall) was performed with students before the experiment, in order to explain how the tools (PBCL and the matrix) work and how to use them to extract and/or formulate engineering requirements. Then, to perform the experiment, the sample was randomly subdivided into two groups (control group and analysis group): 18 students worked with the classical PBCL (control group), while the remaining 18 with the structured matrix (analysis group) implementing the set of items of PBCL. The adoption of the same set of items to be processed by both the control, and the analysis group is intended as a measure to consider the proposed matrix as the only treatment to be tested and investigated with the experiment. Subjects were asked to work individually in order not to mix the outcomes of individual thinking and to maximize the amount of data for the subsequent analysis process.
The instructions provided to students are listed in Table 1. According to Figure 2, in the first phase of the experiment, a paper sheet was distributed to both the control group and the analysis group, which included the information reported in Table 2, together with another sheet including a short description of the items composing the checklist. Additionally, the subjects in the control group received a spreadsheet file to collect their requirements in a simple generic column. Differently, the analysis group received a spreadsheet file containing the proposed structured matrix. Accordingly, subjects from both the groups were asked to use their own laptop to collect requirements. A detailed description of the experimental set up is reported in the following paragraphs, while details about data analysis are provided in Section 2.4.

Phase 1
Thirty-six students of the Master of Science (MS) degree in Mechanical Engineering at the University of Florence, Italy, attending the module "Product Development and Engineering Design", constituted the sample of convenience for the present experiment. With reference to the scopes of the experiment, they are considered as belonging to the same population. In particular, at the time of the experiment, the subjects had never attended any lecture in requirement engineering or requirement formulation in general. Moreover, the module they were attending was the only one focused on engineering design and product development in their whole study plan. Accordingly, their previous experience with the handling of requirements can be considered negligible and largely similar. For that reason, a short briefing (thirty minutes overall) was performed with students before the experiment, in order to explain how the tools (PBCL and the matrix) work and how to use them to extract and/or formulate engineering requirements. Then, to perform the experiment, the sample was randomly subdivided into two groups (control group and analysis group): 18 students worked with the classical PBCL (control group), while the remaining 18 with the structured matrix (analysis group) implementing the set of items of PBCL. The adoption of the same set of items to be processed by both the control, and the analysis group is intended as a measure to consider the proposed matrix as the only treatment to be tested and investigated with the experiment. Subjects were asked to work individually in order not to mix the outcomes of individual thinking and to maximize the amount of data for the subsequent analysis process.
The instructions provided to students are listed in Table 1. According to Figure 2, in the first phase of the experiment, a paper sheet was distributed to both the control group and the analysis group, which included the information reported in Table 2, together with another sheet including a short description of the items composing the checklist. Additionally, the subjects in the control group received a spreadsheet file to collect their requirements in a simple generic column. Differently, the analysis group received a spreadsheet file containing the proposed structured matrix. Accordingly, subjects from both the groups were asked to use their own laptop to collect requirements. Table 1. Instructions provided to the subjects.

Control Group Analysis Group
Step 1 Read the product information in the paper sheet Step 2 Consult the checklist and try to extract a comprehensive list of engineering requirements from the available information Establish the main functions of the system and put them in the related boxes in the matrix provided with the spreadsheet file (if necessary, add new columns) Step 3 Put your requirements in the specific cells of the spreadsheet. You can establish both terms and values as you need, in order to insert any additional and purposeful information.
Consult the items mentioned in the first column of the matrix (checklist parameters) and try to extract a comprehensive list of engineering requirements from the available information.
Step 4 -Put your requirements in the specific cells of the spreadsheet, by associating them to one or more of the previously defined main functions. You can establish both terms and values according to your needs, in order to insert any additional and purposeful information.

Phase 2
The experiment was performed by exploiting a particular academic case study: "A device for teeth and mouth hygiene (e.g., an innovative toothbrush)" characterized by the information reported in Table 2. More specifically, the expected product has not any particular restrictions, but its use is intended in the same context of normal toothbrushes. Accordingly, the subjects could figure out any possible product profile with the sole indication to comply with the list of initial target needs listed in Table 2. Accordingly, it is possible to state that the same stakeholders of normal toothbrushes would have been targeted, but their list has been neither established a priori, nor presented to the subjects involved in the experiment.
Then, subjects were asked to exploit the tools (i.e., the checklist for the control group, and the matrix embedding the same items of the checklist for the analysis group) to extract and/or find the design information needed for the engineering development of the product. Moreover, they could extrapolate whatever they wanted in terms of additional data to formulate engineering requirements. For example, as for "Size" (Table 2), the objective reports that the system can be bigger than the existing ones, but no limits are reported in terms of maximum allowable size. It is a choice of the subject (likely affected by the specific tool they used) to establish and indicate missing information.

Phase 3
At the end of the time allocated for the test (60 min), students were asked to save their own spreadsheet file, by naming it with a prefix (CL for the control group and MTX for the analysis group), followed by a code to make their identity anonymous. Then, students were asked to save their files in a previously shared folder (from the Dropbox ® service). Then, collected files were copied and saved in a different folder, not accessible to students.

Phase 4
After a time interval of about 15 min, an email was sent to each subject, providing a link to a Google form structured with an adapted version of the NASA Task Load Index (TLX) [43][44][45], whose parameters, related questions, and values are reported in Table 3. In particular, before starting with the Appl. Sci. 2020, 10, 3259 8 of 22 TLX, the very first question of the form asked the respondent to report on whether they belonged to the control group or the analysis group.
TLX is often used in design-related experiments with stringent time constraints because it is deemed useful to extract critical information about the perceived task load, which, in turn, affects the willingness to repeat the design tasks under investigation. Examples of applications in design research where time limits were imposed and/or similar durations of experiments were recorded can be found in [46][47][48].
Students were asked to compile the form immediately after the break, but no time limits were assigned. All the responses were received within 20 min.

Metrics for Requirements' Formulation Evaluation
The very first parameter used to perform a comparison between the control group and the analysis group is the number of generated requirements. However, according to Roozenburg and Eekels [25], it is possible to identify other important characteristics of a requirement list: • Validity-as the capability of the requirement to discriminate the extent of the achievement of a certain objective; • Completeness-as the capability of the whole specification to cover all the objectives in the different domains where stakeholders are involved; • Operationality-as the capability of the requirement to make the achievement of the objective measurable, to avoid subjective evaluations of (partial) solutions; • Non-redundancy-as the capability of the specification to be free from duplicates; • Conciseness-as the capability of the specification to contain just the meaningful requirements, without neglecting important facets to be taken into account (not too many, not too few); • Practicability-as the capability of the requirements to be tested, e.g., with simulations or by exploiting available information.
However, Conciseness and Practicability have not been considered in this study, as they do not apply to the specific circumstances of this work. Indeed, the evaluation of conciseness is unfeasible with an aprioristic logic, i.e., without a clear idea of what is the expected result. The evaluation of practicability is also troublesome, since the actual viability of requirements for testing and simulations depends on the expected tests and simulations, which are omitted for the sake of simplifying the design test. Hence, the lack of information about these aspects led to the impossibility to apply this specific metric too.
Two of the considered metrics (Validity and Operationality) can be evaluated by relying on the Element-Name-Value (ENV) fractal model [49]. To provide a short explanation of the ENV model, the sentence "A tomato is a vegetable, round and red" can be modelled as it follows: "Tomato" is the element (E) which has a parameter named (N) color, which has a particular value (V): Red. More specifically, well-formulated requirements, i.e., those where the key parameters E and N are present, are to be considered valid, as they clearly point out the goal to fulfill. Differently, if a specific requirement lacks one of the two parameters (E and N), it is considered here as invalid. Instead, the assessment of requirements' operationality requires the value V to be clarified by the formulation (i.e., if a value is missing or, more in general, not measurable for the requirement as formalized, it is not operational). To give a more specific example, please consider the following requirement extracted (and translated in English) from the case study under consideration.
"The device" (Element) "can be powered on" (Name of the parameter: activation) "with just one hand" (Value). This requirement is valid because the Element and the Name are present. Additionally, it is also operational because the Value parameter is present too.
With regard to the Non-redundancy metric, it can be applied simply by counting how many times the same (or similar) requirement appears in each specific set, e.g., a requirement that uses different words, but targets the same ENV triad.
As for the Completeness metric, the set of stakeholders involved in the lifecycle of the specific product has been identified (see Table 4). The set of stakeholders has been extracted by following an "a posteriori approach", i.e., by analyzing the entire set of requirements generated by all students of all groups. Accordingly, a "complete" specification generated by a specific subject should take into account each of these classes of stakeholders, or at least the different life cycle phases in which said stakeholders are relevant (last column of Table 4).
Subjects were not informed about the metrics applied to the results in order to avoid biased outcomes. Table 2. Initial information provided to the subjects in terms of target goals for the product.

Design Objective Description of the Objective
Hygienic aspects Performances about this item should be comparable to those of existing products of the same type. Nevertheless, the system to be designed should comply at least with standard safety requirements, in order to avoid problems in the oral cavity.

Comfort
No particular performances are expected in terms of comfort.

Aesthetic pleasantness
The ideal solution should be pleasant and perfectly integrated in the environment where it is normally located.

Versatility of use
Performing multiple cleaning operations within the oral cavity is expected. Besides the teeth cleaning, tongue, palate, and gingival interstices should be considered.

Cleaning effectiveness
The teeth cleaning effectiveness must be maximized.
Ease of use The system should be as easy as possible.

Multiple functions
Besides the cleaning functionalities, the system should provide other functionality types. In particular, the system should allow to listen music and/or daily news.

Customization
The system should be configured according to the user preferences.

Size
The system can be bigger than existing products with similar functionalities.

Energy saving
No particular restrictions are provided in terms of energy consumption.  Table 4. The stakeholders identified from the entire set of requirements generated by subjects.

Seller
The person who handles the product until it is out of the store Sale

Buyer
The person who is interested in buying the product and that operates the selection

User
The person handling the product from the first moment after the purchase until its disposal (except for maintenance intervals) Use/Benefit Beneficiary The person receiving the benefits provided by the product (non-necessarily the same person of the user).

Dentist
The person that is indirectly affected by the benefits provided by the product. Other

Transporter
The person that transports the product Transport

Maintainer
The person handling and managing the product during the maintenance operations Maintenance

Disposal guy
The person handling and managing the product during the disposal operations Disposal

Data Collection and Management
In order to manage data, spreadsheets were collected by group, so that the results can be classified by the subject participating in the data collection process. In each subjects' spreadsheet, each requirement has been analyzed to verify the presence of the three parameters of the ENV triad. Moreover, for each requirement, actually affected stakeholders (Table 4) are identified by means of additional columns in the same worksheet ( Table 5).
As for the non-redundancy metric, a specific matrix has been introduced in each worksheet for each subject (see Table 6). Table 5. Table used to assess the requirement sets produced by each subject. "1" or "0" are attributed to each of the ENV if the parameters are present or are missing, respectively. "1" is assigned to each stakeholder actually affected by the requirement.  Table 6. Non-redundancy assessment matrix. The value "1" is introduced in those cells (in the lower triangle of the matrix) where a redundancy has been identified between the requirement in the row with the requirement in the column, i.e., when the share the same ENV triad. Two evaluators, with more than 10 years of experience in engineering design research, coded students' outcomes. Each of them performed all the evaluations needed (as in Tables 5 and 6) for all the requirements, subjects, and groups (analysis and control). Inter-Rater-Reliability was assessed by the Cohen's Kappa test [50,51], as the comparison concerns two evaluators and binary data. When Kappa scores were below 0.66 for a specific metric, the coding results were shared and discussed, and the coding activity was repeated until all the Kappa scores were above the established threshold. As the coded data are essentially categorical variables, their analysis did not allow for any averaging procedure between the two coders. In order to ensure a full reliability of the considered dataset, the analysis considers just the requirements for which both the raters provided consistent answers. Otherwise said, in the following and markedly with reference to Section 3, a requirement was considered valid, operational, non-redundant, and ascribable to a specific life cycle phase just if this emerged from the analysis of both coders.
The adopted metrics and the related coding activity enables the distinction of suitable requirements from those that are not (unequivocally) interpretable or measurable. This filtering process starts from the whole quantity of students' outputs and progressively considers the criteria of validity and operationality (which apply on every requirement). Then, the specifications (the lists of requirements provided by each subject participating in the experiment) are also reduced to remove duplicates (non-redundancy). The residual requirements, per responding subject involved in the experiment, constitute the individually generated design specification. As completeness is a characteristic that belongs to the whole specification, the degree of completeness is measured with reference to the criterion described in Table 4 for the lists of requirements.
All the individually generated tentative specifications, as well as their progressive refinements towards the final set of selected design requirements, constitute the data points of distributions by checklists. These distributions are analyzed in terms of descriptive statistic estimators (average and standard deviation) to highlight the performance of each checklist and compare them against each other.

Results
The following subsection reports the results obtained for each of the four metrics described in the previous section. The reader can find the overall dataset in [52]. In particular, the mentioned data is arranged in a MS excel file, subdivided in two distinct spreadsheets. In the first spreadsheet (named "Without matrix"), the reader can find the raw data and the coding related to the control group. In the second spreadsheet (named "With matrix"), the reader can find the data related to the analysis group. As a result, the two sheets look like Table 5; a column meant to indicate redundancy was added, reporting the value 1 to point to those requirements that had been already listed by the same subject according to an evaluator.
At the end of this section, the results from the TLX survey are also shown.

Overall Productivity
Descriptive statistics are reported in Table 7, which shows that the proposed tool overall provides more populated sets of requirements. Additionally, the boxplot in Figure 3 reveals that the control group (using the checklist) achieved a narrower distribution, while the adoption of the matrix led to a wider variety in terms of number of requirements generated by each subject. A Mann-Whitney test [53] has been performed with the data reported in Table 7, confirming that the distributions are significantly different (p-value = 0.009)-Here, and in the following, the confidence level for significance is set to 0.05 as a rule of thumb. provides more populated sets of requirements. Additionally, the boxplot in Figure 3 reveals that the control group (using the checklist) achieved a narrower distribution, while the adoption of the matrix led to a wider variety in terms of number of requirements generated by each subject.  4 7 5 10 8 8 7 6 8 15 7 6 6 10 11 9 21 13 161 8.94 4.09 Analysis group 13 15 7 6 17 17 20 9 28 10 21 15 17 16 11 11 5   A Mann-Whitney test [53] has been performed with the data reported in Table 7, confirming that the distributions are significantly different (p-value = 0.009)-Here, and in the following, the confidence level for significance is set to 0.05 as a rule of thumb. Table 8 presents results about the validity of the requirements, obtained by skimming the nonvalid requirements from the sets shown in Table 7. In particular, the fractions are reported for each subject, with reference to the total number of generated requirements. This enables stressing the  Table 8 presents results about the validity of the requirements, obtained by skimming the non-valid requirements from the sets shown in Table 7. In particular, the fractions are reported for each subject, with reference to the total number of generated requirements. This enables stressing the effectiveness of the treatment with reference to the control condition. The analysis of validity led to reducing the number of total requirements to 130 (184), i.e., the 81% (74%) of the initial quantity, for the control (analysis) group. Table 8. Fraction of valid requirements for each subject, after the skimming process aimed at eliminating non-valid requirements.

Fraction of Valid Requirements' Formulation for Each Participant Mean St. Dev.
Control group 1 1 0.8 0.9 0.9 0.9 0. effectiveness of the treatment with reference to the control condition. The analysis of validity led to reducing the number of total requirements to 130 (184), i.e., the 81% (74%) of the initial quantity, for the control (analysis) group. Table 8. Fraction of valid requirements for each subject, after the skimming process aimed at eliminating non-valid requirements.  A two-sample Mann-Whitney test has been performed with the data reported in Table 8, showing that the distributions can be considered significantly different (p-value = 0.016). Table 9 collects the fractions of valid and operational requirements by subject and by group. In particular, after performing the filtering of non-valid requirements, a further screening has been  A two-sample Mann-Whitney test has been performed with the data reported in Table 8, showing that the distributions can be considered significantly different (p-value = 0.016). Table 9 collects the fractions of valid and operational requirements by subject and by group. In particular, after performing the filtering of non-valid requirements, a further screening has been performed on the resulting set, by also eliminating non-operational ones. The remaining number of requirements, for each subject, has been divided by the total number of generated requirements (see Table 7, thus obtaining the percentages listed in Table 9). The analysis of validity and operationality led to reducing the number of total requirements to 117 (158), i.e., the 73% (64%) of the initial quantity, for the control (analysis) group.  From the dataset of Table 9, it has been possible to achieve the boxplot represented in Figure 5, which shows a more balanced reduction for both the groups if compared with Figure 4. The Mann-Whitney test performed on valid and operational requirements shows that the differences between the two distributions are statistically significant (p-value = 0.016). Table 10 shows the results for non-redundancy, obtained by skimming redundant requirements from the set of valid and operational ones. In particular, the values reported in Table 10 have been obtained for each subject, by dividing the obtained number by the total number of generated requirements (see Table 7). The analysis of validity, operationality, and non-redundancy led to reducing the number of total requirements to 113 (90), i.e., the 70% (36%) of the initial quantity, for the control (analysis) group.  As inferable by the mean values of Table 10, this last skimming process produced a considerable effect on the set of requirements of the analysis group, thus highlighting a possible problem caused by the proposed tool. This evidence appears even clearer with the boxplot shown in Figure 6, and, as expected, the performed Mann-Whitney test confirms an undisputable difference between the two distributions (p-value = 0.000). The Mann-Whitney test performed on valid and operational requirements shows that the differences between the two distributions are statistically significant (p-value = 0.016). Table 10 shows the results for non-redundancy, obtained by skimming redundant requirements from the set of valid and operational ones. In particular, the values reported in Table 10 have been obtained for each subject, by dividing the obtained number by the total number of generated requirements (see Table 7). The analysis of validity, operationality, and non-redundancy led to reducing the number of total requirements to 113 (90), i.e., the 70% (36%) of the initial quantity, for the control (analysis) group.  As inferable by the mean values of Table 10, this last skimming process produced a considerable effect on the set of requirements of the analysis group, thus highlighting a possible problem caused by the proposed tool. This evidence appears even clearer with the boxplot shown in Figure 6, and, as expected, the performed Mann-Whitney test confirms an undisputable difference between the two distributions (p-value = 0.000). As inferable by the mean values of Table 10, this last skimming process produced a considerable effect on the set of requirements of the analysis group, thus highlighting a possible problem caused by the proposed tool. This evidence appears even clearer with the boxplot shown in Figure 6, and, as expected, the performed Mann-Whitney test confirms an undisputable difference between the two distributions (p-value = 0.000).

Completeness
Differently than what was performed with other metrics, the results about completeness are presented with a requirement-centered perspective. It means that results refer to requirements as the collection of all the requirements formulated by the subjects participating in the study. Accordingly, Table 11 shows the occurrences of the life cycle phases mentioned in the formulated requirements. These life cycle phases have been extracted a posteriori, based on the experiment outcomes. The percentages have been obtained by referring to the total counts obtained by each group. The values listed in Table 11 can be represented by the radar plot shown in Figure 7, which shows the distribution of requirements consistently with the class of stakeholders/solution's lifecycle stage they refer to.
By observing both Table 11 and Figure 7, it is possible to notice that the analysis group (using the matrix) obtained a more balanced allocation of the requirements. In particular, the focus on the Use/Benefit phase has been reduced. The more balanced allocation of requirements is also confirmed by a reduced value for the variance (of the sample/population) of the analysis group percentages (0.035/0.030) if compared with those of the control group (0.053/0.044). To verify if the two distributions are different, an χ 2 test was performed, which has led to a p-value equal to 0.009, which has proven the significant variation of the distribution across life cycle phases when shifting from the control to the analysis group. The values listed in Table 11 can be represented by the radar plot shown in Figure 7, which shows the distribution of requirements consistently with the class of stakeholders/solution's lifecycle stage they refer to.  Table 4.
By observing both Table 11 and Figure 7, it is possible to notice that the analysis group (using the matrix) obtained a more balanced allocation of the requirements. In particular, the focus on the Use/Benefit phase has been reduced. The more balanced allocation of requirements is also confirmed by a reduced value for the variance (of the sample/population) of the analysis group percentages (0.035/0.030) if compared with those of the control group (0.053/0.044). To verify if the two distributions are different, an χ 2 test was performed, which has led to a p-value equal to 0.009, which has proven the significant variation of the distribution across life cycle phases when shifting from the control to the analysis group.

Perceived Task Load
The answers to the questions submitted to students ascribable to the TLX-NASA test (see Section 2.3, Phase 4) have been collected and analyzed, leading to the boxplots shown in Figure 8.

Perceived Task Load
The answers to the questions submitted to students ascribable to the TLX-NASA test (see Section 2.3, Phase 4) have been collected and analyzed, leading to the boxplots shown in Figure 8.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 15 of 22 some differences. In particular, Figure 8 reveals that the analysis group perceived higher efforts in performing the task. The Mann-Whitney test revealed that the differences are statistically significant (p-value < 0.05) for Physical Demand. Similarly, the differences observed for the Perceived Effort led to a p-value of 0.047, which suffices to confirm the higher load perceived by subjects using the proposed matrix. However, subjects from the analysis group felt apparently more satisfied about their work, as shown by the Performance boxplot in Figure 8. The statistical reliability of the observed difference has been verified with a Mann-Whitney test (p-value 0.047).  Table 3) of the submitted form.

Obtained Results
By looking at the total number of generated requirements, the differences between control group and analysis group are quite evident (see Figure 3 and Table 7). Indeed, it is possible to state that the proposed matrix led subjects to formulate more populated tentative specifications.
When applying the filter of "validity", the requirements produced by both groups undergo a Figure 8. Boxplots of the answers gathered for each parameter (see Table 3) of the submitted form.
While Time Demand, Mental Demand, and Frustration levels are equally perceived by the subjects of both groups (although with slightly different distributions), the other parameters present some differences. In particular, Figure 8 reveals that the analysis group perceived higher efforts in performing the task. The Mann-Whitney test revealed that the differences are statistically significant (p-value < 0.05) for Physical Demand. Similarly, the differences observed for the Perceived Effort led to a p-value of 0.047, which suffices to confirm the higher load perceived by subjects using the proposed matrix.
However, subjects from the analysis group felt apparently more satisfied about their work, as shown by the Performance boxplot in Figure 8. The statistical reliability of the observed difference has been verified with a Mann-Whitney test (p-value 0.047).

Obtained Results
By looking at the total number of generated requirements, the differences between control group and analysis group are quite evident (see Figure 3 and Table 7). Indeed, it is possible to state that the proposed matrix led subjects to formulate more populated tentative specifications.
When applying the filter of "validity", the requirements produced by both groups undergo a considerable and statistically significant reduction (see Section 3.2). Similarly, with the additional filter of "operationality", although both of the groups undergo an averagely similar reduction, the differences between analysis group and control group are more apparent and still statistically significant (according to the performed Mann-Whitney test). This implies that the proposed tool failed to support the formulation of valid and operational requirements, as it performed worse than the simple checklist. However, the major negative effect can be observed for non-redundancy (see Section 3.4). Nevertheless, by analyzing the spreadsheets produced by the subjects from the analysis group, it was quite evident that the problem lay in the repetition of common requirements in different columns (representing each function). Indeed, when in the presence of a requirement representing the whole system (e.g., overall admissible size), subjects simply copied the same requirement in many different columns. Additionally, some students failed to understand the difference between wishes and demands and many repeated the same requirement on the two distinct rows. These problems can also partially explain the higher efforts perceived by the analysis group (see Section 3.6), especially in terms of physical demand (e.g., due to the unnecessary copy/paste actions).
When it comes to the Completeness parameter, the two groups similarly focused on the Use/Benefit phase (see the radar plot in Figure 7), but the analysis group produced a more equally distributed allocation of requirements. The data are not sufficient to confirm the positive effect of the matrix in that sense, but provide a first hint for future activities devoted to a more comprehensive investigation about the observed performance.
As for the perceived task load, Mental Demand, Time Demand, and Frustration levels are equally perceived by the subjects of both groups (although with slightly different distributions). Differently, other parameters present some differences, which highlight some possible issues for the tool. In particular, the Mann-Whitney test revealed that the differences are statistically significant (p-value < 0.05) for Physical Demand and the Perceived Effort.
Overall, mainly thanks to the higher satisfaction perceived by the analysis group (see Section 3.6), it is possible to state that this first application of the proposed matrix produced some positive effects to the subjects involved in the experiment. However, this little benefit has been overshadowed by some major problems, especially in terms of redundant requirements. Nevertheless, thanks to the identified shortcomings that led subjects to repeat formulated requirements in the matrix, useful hints for future developments can be extracted from this work, as reported in the next subsection.

Limitations and Future Developments
A limitation of the work concerns the consideration of only one reference checklist for both building the matrix and supporting the control group. This peculiarity surely affects the general validity of the considerations presented in the previous section. Therefore, a first hint suggests repeating the same study with additional checklists.
Another limitation of the work comes from the predefined input delivered to subjects. This need arose to ensure a shared vision of the task, but it relentlessly led to a condition that can meaningfully differ from a real case. In addition, the considered sample of participants (engineering students from the same institution) implies further limitations. Indeed, besides the effects of their limited expertise (which led to results that could not be applicable to more experienced practitioners), it is also not clear to what extent the results are affected by contextual aspects. Consequently, it is necessary to perform additional experiments extended to students from different institutions, disciplines, and/or countries. This could be useful to obtain statistically significant evaluations, and then more robust indications for selecting the most suitable checklist to be used in combination with the proposed tool.
However, one of the developments to be performed in the next future is an upgrade of the matrix targeting the useless repetition of common requirements. For example, in addition to the columns representing the main functions, it would be possible to include an additional column named "common (or cross-cutting) requirements", where requirements are listed that apply to the whole system, i.e., that are common to all functions. Indeed, many subjects diffusedly repeated the same requirement in all the columns generated for each function. This obviously led to a very large number of requirements that were mere duplicates. A specific column to report requirements shared to all or multiple functions (e.g., aesthetic appearance of the whole product) is expected to allow the use of function columns only for those requirements strictly related to that specific function (e.g., a requirement just for dental cleaning). The test of this improved version of the tool constitutes the next experimental activity the authors plan to perform. The results will be expectedly compared with those described in the present paper.
The lack of a shared definition of function also led to different numbers of columns, thus sometimes worsening the non-redundancy problem observed for the matrix. Future experiments should be repeated by providing a shared definition of function (e.g., that of Pahl and Beitz, if dealing with engineering students), in order to verify the presence of any advantage. Nevertheless, the selection of the reference definition of function has to be performed carefully [19].
A further development of the work could be the evaluation of the "quality" of the obtained specifications in terms of possible help or hindrance in exploring the design space. Indeed, if, on the one hand, abstract and generic requirements may be useful to avoid undesired fixations [42,54] on specific designs, on the other hand, they can lead to numerous design iterations. However, too many detailed specifications can lead to opposite effects. Therefore, future studies should perform additional investigations on design outcomes coming from design tasks where the requirements have been obtained with specific checklists. To this purpose, well known creativity or idea generation effectiveness metrics can be successfully adopted [35][36][37][55][56][57].
Additionally, the possibility to list the main functions of the product directly in the matrix allows a rapid implementation of systematic conceptual design methods based on functions, e.g., [3,7,26,58]. Indeed, the functions listed in the matrix's columns can be considered as the starting point for the design space exploration with purposeful schematic representations [59]. Therefore, the proposed tool also paves the way for future studies targeting the development of a comprehensive methodology to support designers in the fuzzy-front-end of the engineering design process.

Expected Impact
Despite the highlighted issues, the proposed matrix constitutes a preliminary tool to help novices to exploit the potentialities of checklists better. Upgraded versions of the matrix (e.g., with an additional column for crosscutting requirements and/or with improved checklist items) can be exploited both by novice designers and design lecturers to support the non-trivial translation of general product information into more detailed engineering design specifications.
Moreover, the experimental approach, the obtained results, and the research hints provided in this section are expected to promote new research on how to support designers in formulating comprehensive sets of requirements. Accordingly, the experimental approach of the present work can be repeated with any kind of design checklist and/or any experimental tool supporting the definition of design specifications.
Therefore, this work has the potential to lead to future studies on both requirement checklists and the development of structured methodological tools, where one of the potential outcomes is the definition of a framework that links the design phases of product planning and conceptual design.

Conclusions
With a specific reference to systematic design procedures addressing physical products, the original aspects, the key findings, and the main lessons learned about the present contributions follow, and they will be further stressed in the residuals of this section: • A new tool has been introduced and tested to focus on the main products' functions and to subsequently support requirements' formulation by exploiting the advantages of both requirements' checklists and abstraction.

•
With a clear difference with respect to existing checklists, e.g., [9], and design methods, the tool disentangles itself from any specific meaning and interpretation of "function". In addition, thanks to such independence from the employed definition of function, the tool is a candidate for being used in different design disciplines and application fields. • A procedure to evaluate the metrics proposed in [28] has been fine-tuned, which has led to the possibility to achieve a considerable agreement between two expert coders. Markedly, from a methodological viewpoint, the procedure swivels on the ENV concept to assess validity and operationality, and on the individuation of stakeholders and lifecycle phases to evaluate completeness.

•
A previously unavailable performance benchmark has been determined in terms of the possible quality of the requirement formulation process in terms of the metrics proposed in [28].

•
The results, and specifically the outcomes in terms of non-redundancy, have highlighted the need to separate requirements applicable to the whole design from those ascribable to specific functions. Previous design studies have failed to capture this aspect or, at least, to disclose it explicitly. Consequently, the handling of requirements that regard the design of a whole deserve specific research attention, with a particular reference to requirements' formulation or decomposition.

•
The results in terms of perceived effort concur to stress usability aspects beyond effectiveness when it comes to developing new design methods and tools.
As mentioned, the main original element of the paper is represented by its targeting the problem of formulating useful and usable requirement lists, which has been abundantly overlooked hitherto. More specifically, the present study is concerned with investigating the effects of augmenting the level of structuredness in the use of standard checklists for design requirements, to be used in the context of engineering design of physical products. In this research domain, the paper presents the study of a candidate tool for supporting the formulation of comprehensive sets of engineering design requirements. The tool is constituted by a structured matrix, where the groups of columns represent the main functions of the desired product, and the rows are constituted by the checklist items, which are further subdivided into design wishes and demands.
To test the matrix, a sample of thirty-six engineering students has been asked to formulate a set of requirements, by starting from an initial set of general product information. Eighteen students constituted the control group and they were asked to use the checklist Pahl and Beitz developed for conceptual design purposes. Differently, students in the analysis group were asked to use the proposed tool (implementing the same requirement checklist) by following the corresponding procedure.
The outcomes from both groups have been assessed by means of well acknowledged metrics specifically developed for design requirements [28]. More specifically, the metrics adopted in this paper were Validity, Operationality, Non-redundancy, and Completeness. The outcomes of the experiment highlighted relevant problems for the proposed tool, especially in terms of the generation of redundant requirements. However, the likely reason of the problem has been identified and the means for an upgrade and more efficient version of the tool have been individuated. In particular, the absence of a column for common (or crosscutting) requirements led students to a useless repetition of the same requirement into the different columns representing the different functions. Conversely, the tool provided moderately positive results in terms of completeness of requirements. As the authors assessed students' workload by means of the NASA TLX method, some shortcomings emerged ascribable to the tested matrix, especially in terms of physical demand. The authors suppose that the matrix's drawback related to non-redundancy is somehow linkable to the emerged workload issues.
Despite the plainly unsatisfactory outcomes, it is worth noting that the obtained numerical results are, to the authors' best knowledge, one of the few quantitative examples of checklists' application. These outcomes might represent a first benchmark for comparisons with experiments using requirements' checklists or, more generally, studying the process of requirements' formulation or decomposition. Indeed, it is also worth stressing that many studies are required to acquire critical knowledge of effective ways requirements' formulation is supported. All these studies relentlessly require a major contribution of humans and designers to extract effective strategies. In fact, at present, the process of accurate requirements' formulation is not sufficiently standardized to allow the development of intelligent systems capable of replicating the process. This contrasts with other design stages that are increasingly supported by smart systems and big data, and markedly with requirements elicitation, user-oriented need identification, or task clarification [60][61][62][63][64], which precede requirement formulation in systematic design approaches. This might give rise to design-related activities that are fully automated, but the outputs of these processes cannot be analyzed by a smart system on their turn, and the saved human resources might be allocated to handle extracted needs and functions for the upcoming design phases, especially requirements' formulation.
As studies about systematic and repeatable formulation of requirements will be expectedly conducted in the future, the contribution of the present paper includes a structured way to • prepare an experimental set-up; • a case study that can be exploited for comparing results; • a procedure to assess requirements' formulation or decomposition according to well-established criteria, which has been featured by a satisfactory Inter-Rater Reliability between experienced coders; • a proposal to assess the workload caused by the use of tools for requirements' formulation, whose outcomes can be compared with those emerged here as well.
Some methodological limitations ascribable to this work are worth remarking too. First, the limited number of participants, although sufficient for achieving statistical evidence, is not sufficient for extracting generally valid considerations. Moreover, the experimental subjects are students, thus potentially not representative of the industrial environment. Additionally, the initial design brief is simple and the leveraged product displays a low degree of complexity (toothbrush). Therefore, the applicability of these results to the industrial context is not immediate, as this requires subjects with a higher level of expertise and additional investigations tailored to specific engineering domains.
Nevertheless, beyond what was suggested here, the authors expect that the presented research approach and the experimental protocol might be reused by peers in order to perform additional analysis and/or to validate the results presented in this paper. Accordingly, several hints for further research activities are provided in this paper. In particular, it has been proposed to investigate the actual impact of the quality of the design specification on the design space exploration. In this context, it has been suggested that the possibility given by the proposed tool of listing the main functions of the system to be designed could support designers in applying systematic conceptual design procedures.