Journal Pre-proof Harmonization and standardization of malnutrition screening for all adults – A systematic review initiated by the Norwegian Directorate of Health

Background & Aim s: The Norwegian Directorate of Health has identified a need to harmonize and standardize the malnutrition screening practice in Norwegian hospitals and primary health care settings, in order to provide a seamless communication of malnutrition screening along the patient pathway. Our aim was to perform a systematic review of the validity and reliability of screening tools used to identify risk of malnutrition across health care settings, diagnoses or conditions and adult age groups, as a first step towards a national recommendation of one screening tool. Methods: A systematic literature search for articles evaluating validity, agreement, and reliability of malnutrition screening tools, published up to August 2020, was conducted in: MEDLINE, Embase, APA PsycInfo, Cinahl, Cochrane Databases, Web of Science, Epistemonikos, SveMed+, and Norart. The systematic review was registered in PROSPERO (CRD42022300558). For critical appraisal of each included article, the Quality Criteria Checklist by The Academy of Nutrition and Dietetics was used. Results: The review identified 105 articles that fulfilled the inclusion and exclusion criteria. The most frequently validated tools were Mini Nutritional Assessment short form (MNA), Malnutrition Universal Screening Tool (MUST), Malnutrition Screening Tool (MST), and Nutritional Risk Screening 2002 (NRS-2002). MNA, MST and NRS-2002 displayed overall moderate validity, and MUST low validity. All four tools displayed low agreement. MST and MUST were validated across health care settings and age groups. In general, data on reliability was limited. Conclusions: The screening tools MST and NRS-2002 displayed moderate validity for the identification of malnutrition in adults, of which MST is validated across health care settings. In addition, MNA has moderate validity for the identification of malnutrition in adults 65 years or older.


Introduction
Malnutrition is a common condition and can be both a cause and a consequence of disease.
Malnutrition also negatively affects the prognosis of disease. The Global Leadership Initiative on Malnutrition (GLIM) criteria are international consensus-based diagnostic criteria for malnutrition (1). The first step in diagnosing malnutrition in GLIM is screening to identify individuals at risk of malnutrition using a validated screening tool (1). Thus, tools used for screening are not diagnostic tools, but identifies persons at risk of becoming malnourished or persons who already are malnourished. Several malnutrition screening tools are available, but with a large variation in level of validity, reliability, and generalizability, that will affect the ability to accurately identify adults who are malnourished and in need of nutritional treatment (2).
Internationally, a wide array of screening tools are used to identify the risk of malnutrition. Discontinuities of care in the transition between different levels in the health care systems have been identified as risk factors for increased readmission rates and adverse medical events (4). A harmonization and standardization of the screening method may lead to more accurate screening practice and comparison of the risk of malnutrition (5) during the patients' journey from one health care setting to another (6). The harmonization and standardization of the malnutrition screening may also facilitate a national overview of the burden of malnutrition and its distribution across care settings and regions (6).
The Norwegian Directorate of Health has therefore identified a need to harmonize the malnutrition screening practice across health care settings, diagnoses or conditions and adult age groups. Such a harmonization is in line with former work in other countries. The British Association for Parenteral and Enteral Nutrition (BAPEN) has since 2003 implemented MUST as the recommended screening tool (7;8) providing comparable data across care J o u r n a l P r e -p r o o f settings (8). The American Academy of Nutrition and Dietetics (9) recommended the Malnutrition Universal Screening tool (MST) to screen adults for malnutrition regardless of their age, medical history or setting (2;10). However, one specific malnutrition screening tool with outstanding validity, reliability, and strong supportive evidence across all care settings among adults has not yet been identified.
As a first step towards a national recommendation of one screening tool for the risk of malnutrition in the entire Norwegian health care system, we conducted a systematic review as an update and extension of the systematic review performed by Skipper et al. (10), by adding more recent literature, revising the comparison standard (including GLIM), and expanding with a Scandinavian literature search. The aim of this systematic review was to summarize the validity of commonly used screening tools to identify risk of malnutrition across health care settings, diagnoses or conditions, and adult age groups.

Materials and Methods
The PRISMA (Preferred reporting Items for Systematic Reviews and Meta-Analysis) statement was used as the guideline for the review and reporting (11) to ensure objectivity, transparency, and reproducibility of the process. The systematic review has been registered in PROSPERO (CRD42022300558). For critical appraisal of each included article, the Quality Criteria Checklist (12) by The Academy of Nutrition and Dietetics (9) was used.

Research question and eligibility criteria
The research question was formed using the population, intervention, comparison intervention and outcome (PICO) format, to ensure specificity and relevance to the aim of the project (Table 1). The population criteria for eligibility of studies were adults 18 years or older, any health care settings, and any diagnoses or conditions. The inclusion criteria for studies were quantitative validation studies, published in peer-reviewed journals, written in English, Norwegian, Swedish or Danish language, and at least 20 participants for each comparison.
Exclusion criteria were studies using country-specific or modified versions of a tool, tools exclusively consisting of laboratory values and studies only published as abstracts.
The intervention included the 15 common screening tools used in relevant care settings, listed in Table 1. There is no agreed upon gold standard in order to compare the validity of J o u r n a l P r e -p r o o f screening tools (6). Therefore, a set of comparison standards for the validation of screening tools were used, as listed in Table 1. The comparison standards were defined based on well validated "semi-gold standards", and as defined by Skipper et al. (10) in order to facilitate comparison. Furthermore, the GLIM criteria (1) were added as a "semi-gold standard" during the literature review. When used as the sole criterion, BMI was not considered an acceptable gold standard for malnutrition.
The usefulness of a malnutrition screening tool can be measured as the ability to measure the important dimensions of malnutrition in the population at quest (content validity), test-retest and inter-observer variation (reliability), and ability to measure the agreement between the screening tool and the gold-standard or semi-gold-standard (concurrent validity) (6).
Concurrent validity refers to the ability of the screening tool to identify malnutrition, and can be quantified through: sensitivity (the probability of a positive screening result given that the person is malnourished), specificity (the probability of a negative screening result given that the person is not malnourished), positive predictive value (PPV) (the proportion of true positive screening tests among all positive tests), negative predictive value (NPV) (the proportion of true negative tests among all negative tests), and kappa values (the agreement between tools using Cohen's kappa coefficient). In addition, reliability (consistency of results when using the screening tool) was included in the search. All relevant outcomes are listed in Table 1, and in the complete search strategies in Supplementary Table 1. To be able to harmonize and standardize the malnutrition screening practice for all adults, the tool needed to be validated across adult age groups, health care settings, and diagnoses or conditions. All identified records were added, sorted, screened for duplicates (using different combinations of fields in preferences), and organized in the EndNote x9 software by Clarivate Analytics, Web of Science TM. The list of records was independently screened based on title and abstract, and on eligibility criteria identified by the PICO, by two reviewers (THT, IP) blinded for each other's decisions. In the case of disagreement on screening status, consensus was reached between the two reviewers through a third common review.

Systematic literature search
One additional record was identified through the reviews of relevant literature. The tools Nutritional Risk Index (NRI) and Prognostic Nutritional Index (PNI) were excluded during the review process (after the literature search) since both tools exclusively assess laboratory values. There were no articles validating the tool "Ernaeringsjournal" (Norwegian) [translates to "Nutrition journal"].

Review of the evidence and data extraction
The identified records that met the eligibility criteria were systematically reviewed full-text by both reviewers (independently and blinded) according to inclusion and exclusion criteria, quality of evidence, and outcome of interest. One reviewer (IP) extracted the data, and another reviewer (THT) double checked the extracted data. The following data was retrieved from each eligible research article: reference, publication year, quality of evidence, sample size, country, setting, condition/ward/diagnosis, mean/median age, lower age limit for inclusion, intervention tool, comparison tool, and relevant results of sensitivity, specificity, PPV, NPV, correlation coefficient (CC) and concordance (Cohen's kappa values) ( Table 2). Each separate performance indicator (sensitivity, specificity, PPV, NPV, agreement (Cohen's kappa)) was evaluated based on pre-defined cut-off values as listed in Table 2 (13;14), while overall J o u r n a l P r e -p r o o f validity of each screening tool was determined using an algorithm based on the algorithm developed by Skipper et al. (10).

Quality of evidence
The quality of articles was critically appraised independently by both reviewers for each of the included articles, using the Academy's Quality Criteria Checklist of The Academy of Nutrition and Dietetics (12). The reviewers were blinded for the results of the other reviewer.
The critical appraisal includes issues of inclusion/exclusion, bias, and data collection and analysis. When there was initial disagreement between the researchers on the quality assessment, consensus was reached through a third common review. Each article was graded as positive (+) indicating that the report has clearly addressed the issues, negative (-) indicating that these issues have not been adequately addressed, and neutral (ø) indicating that the report is neither exceptionally strong nor exceptionally weak in quality.  Table 2 either with test-retest or inter-rater reliability of the respective tools. One reviewer (THT) extracted the data from each eligible research article, and the other reviewer (IP) checked the extracted data. The following data were extracted: reference, publication year, sample size, country, setting, condition/ward/diagnosis, mean/median age, lower age limit for inclusion, intervention tool, observer comparison, comparison period, and relevant results of CC, intraclass correlation coefficient (ICC), and agreement coefficients (Gwet's AC1 and Cohen's kappa values). To summarize the evidence only agreement coefficients were comparable and were interpreted as described in Table 2.

Results
The inclusion of records is summarized in a PRISMA diagram (  Table 3).
The validity (sensitivity, specificity, PPV, and NPV) and agreement (Cohen's kappa) is summarized in Table 3. In addition, validity, agreement, quality, and characteristics of all included studies can be found in the following tables: MNA (Table 4), MST (Table 5), MUST (Table 6), NRS-2002 (Table 7), and Nutritional Form for the Elderly (NUFFE), Nutriscore, Patient generated subjective global assessment short form (PG-SGA-SF), Short nutritional assessment questionnaire (SNAQ), and Simplified nutritional appetite questionnaire (SimplifiedNAQ) ( Table 8) Table 4). Median sample size was 250. Table 3 lists the median sensitivity, specificity, PPV, NPV, and agreement against all references, and against other references than Full MNA. The majority of comparisons (37 comparisons) were done in older adults, and the most common setting was community-dwelling (12), nursing homes (9) or inpatients (10) within a variety of conditions/wards. Risk of bias was summarized as quality of primary research in 34 articles of which 16 was graded as positive (+) and 18 was graded as neutral (ø). One article was found to report on reliability of the MNA tool, with an inter-rater reliability of 0.31 (15). In conclusion, MNA obtained moderate validity, low agreement and validation studies limited to the older adult population across health care settings and conditions or wards. The quality of research was positive in 47% of the articles, and data on reliability was limited.

Malnutrition Screening Tool (MST)
MST was validated in 26 articles and with a total of 31 comparisons, of which 16 against SGA, nine against PG-SGA, three against Full MNA, two against GLIM, and one against J o u r n a l P r e -p r o o f McWhriter (Table 5). Median sample size was 134. Table 3 lists the median sensitivity, specificity, PPV, NPV, and agreement. Of the comparisons, 15 were in populations of 18 years or above, and seven in older adults. The most common comparison setting was inpatients (15), outpatients (12), within a variety of conditions or wards. The quality of primary research was graded as positive (+) in 17 of the articles and neutral (ø) in nine articles. Six articles were found to report on reliability of MST (16;18-22), with a total of 10 comparisons. The mean inter-rater reliability between comparisons was 0.64 (0.28-0.93) measured in kappa values and 0.8 (0.6-0.9) with Gwet's AC1. In conclusion, MST obtained moderate validity, low agreement, and validated across age groups, health care settings, and conditions or wards. The quality of research was positive in 65% of the articles, and data on reliability was moderate.

Malnutrition Universal Screening tool (MUST)
MUST was validated in 35 articles with a total of 41 comparisons of which 21 against SGA, six against PG-SGA, 11 against Full MNA, two against GLIM, and one against a nutrition assessment including body composition and change in body weight over time (Table 6). Table   3 lists the median sensitivity, specificity, PPV, NPV, and agreement. Most of the comparisons were performed in inpatients (26) or outpatients (9), within a variety of conditions or wards.
Of the comparisons, 19 were in adult populations, and 15 in older adults. Quality of primary research was graded as positive (+) in 19 articles and neutral (ø) in 16 articles. Reliability was reported in three studies (16)(17)(18), with a mean inter-rater reliability between two studies of 0.68 (0.58-0.78). In conclusion, MUST obtained low validity, low agreement, and validity across age groups, health care settings, and conditions or wards. The quality of research was positive in 56% in of the articles, and data on reliability was limited.

Nutritional Form for the Elderly (NUFFE)
NUFFE was validated in one article and with one comparison against Full MNA with a sensitivity of 70, specificity of 76, PPV of 81, and NPV of 30 (Table 3, Table 8 (Table 7). Table 3 lists the median sensitivity, specificity, PPV, NPV, and agreement. Median sample size was 210, and the majority of comparisons (23 comparisons) were done in populations 18 years or above and in older adults (14 comparisons (Table 3, Table 8). The validation was performed in a population of 394 oncology outpatients. Quality of primary research was graded as positive (+) in the included article.

Patient generated subjective global assessment short form (PG-SGA-SF)
PG-SGA-SF was validated in three articles and with a total of five comparisons, all against PG-SGA ( Table 8). The median sample size was 246, of which all validations were performed in populations 18 years or above. Table 3 lists the median sensitivity, specificity, PPV, NPV, and agreement. The setting for four comparisons were in oncology and one nephrology ward.
It should be noted that three of the comparisons were performed with different cut-off values for risk of malnutrition in the same population. Quality of primary research was graded as positive (+) in all three articles.

Short nutritional assessment questionnaire (SNAQ)
J o u r n a l P r e -p r o o f SNAQ was validated in five articles and with a total of six comparisons of which four against SGA, one against GLIM, and one against Full MNA (Table 8). The median sample size was 170, and four validations were performed in inpatients, and one in outpatients. Four of the comparisons were in populations 18 years or above, and two in populations 65 years or above. Table 3 lists the median sensitivity, specificity, PPV, NPV, and agreement. Quality of primary research was graded as positive (+) in three articles and neutral (ø) in two articles.

Simplified nutritional appetite questionnaire (SimplifiedNAQ)
SimplifiedNAQ was validated in six articles and with a total of eight comparisons of which six against Full MNA and two against SGA (Table 8). Median sample size was 180, and all validations were performed in populations above 55, 60 or 65 years of age within different health care settings. Table 3 lists the median sensitivity, specificity, PPV, NPV, and agreement. Quality of primary research was graded as positive (+) in three articles and neutral (ø) in three articles.

Overall validity
For each screening tool, the overall validity was based on the algorithm as shown in Figure 2.

Discussion
In this systematic review, we summarized the validation of malnutrition screening tools for adults (18 years  Comparison between subgroups were limited due to lack of standardization in the reported description of age (range), setting, diagnosis or condition. In order to recommend one screening tool across health care settings, the instrument should be validated within different age groups, settings and/or conditions/wards where the tool will be implemented. This was only true in a reasonable range for two of the screening tools -MST and MUST.

Conclusions
The screening tools MST and NRS-2002 display moderate concurrent validity for the identification of malnutrition in adults, of which MST is validated across health care settings.
In addition, MNA has moderate validity for the identification of malnutrition in adults 65 years or older.
We thank the working group for the revision of the Norwegian guidelines on prevention and treatment of malnutrition (as appointed by the Norwegian Directorate of Health) for advice during the review process.  (6)