INTRODUCTION

In a nation with health care system flaws and rising costs, health care reformers and political leaders are looking to primary care for solutions.1 Primary care providers are charged with improving preventative care, managing chronic diseases, and keeping patients out of hospitals, but face significant challenges, including time pressure, documentation burden, and an increasingly aging population.2,3,4 Organizational factors, including organizational culture and climate, are increasingly recognized as crucial to strengthening the primary care workforce and improving the quality of primary care.5,6,7

Organizational culture in health care is defined as the norms, values, and basic assumptions of a given organization, which drive both the quality of work life and the quality of care.8,9,10 Organizational climate is defined as the collective perception of the organization’s culture and how it impacts personal well-being and functioning.10 Organizational culture and climate are interconnected; an organization’s beliefs and values (culture) govern its members’ experience (climate). Together, they make up the “feel” of an organization.11,12 Unfavorable organizational cultures, such as those characterized by chaotic work environments, time pressures, and lack of control, have been associated with poorer outcomes, including greater provider burnout.2,13 Hierarchical cultures and those resistant to change have also been identified as significant barriers to innovation and implementation of evidence-based practices.14,15,16 In contrast, favorable cultures and climates, such as those that enhance autonomy, promote diversity and inclusion, and facilitate cooperation and collaboration, are associated with better outcomes, including provider well-being and engagement in quality improvement.17,18

Moving forward with research on organizational culture and climate in primary care requires greater attention to measurement. Although many tools to measure culture and climate have been used in health care settings, variation in measure domains and psychometric properties makes it difficult to draw consistent and reliable conclusions.10,19,20 Additionally, measures developed for other types of health care organizations, such as hospitals, may not be valid for primary care.21 Identification of validated measures for primary care will encourage more informed instrument selection in future research and foster greater understanding of the organizational culture and climate of these settings, leading to effective and sustainable ways to achieve organizational changes that improve the quality of care.

We conducted a systematic review to identify measures of organizational culture and climate used in primary care within recent years (2008–2019) and evaluate their psychometric properties. We focused on measures used with primary care practitioners who have direct patient contact, whose experience of culture and climate is likely to have the most direct influence on patient outcomes. We aimed to formulate recommendations on instrument selection based on these findings and suggest directions for future research.

METHODS

Protocol and Registration

This systematic review follows the publishing guidelines set forth by PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses),22 was assessed for quality using the AMSTAR checklist (https://amstar.ca/Amstar_Checklist.php), and contains a registered protocol with PROSPERO (CRD 42019133117). Our protocol is based on the Consensus-Based Standards for the selection of health Measurement Instruments (COSMIN) methodology.23,24,25

Eligibility Criteria

Eligibility criteria were determined a priori (Table 1). In summary, included articles met the following criteria: published 2008 and after, quantitative or mixed methodology were used, majority of the population were primary health care professionals, setting in primary care, and the target population completed an instrument measuring the general concept of organizational culture or organizational climate. Non-English articles were included; Google Translate was used to translate articles into English.26 Empirical research articles from journals and dissertations were included; cross-sectional, case-control, cohort studies, and clinical trials were also included.

Table 1 Study Eligibility Criteria

Information Sources and Search Strategy

Two health sciences librarians with systematic review experience (BF, HVV) developed searches for the databases PubMed, PsycINFO, HAPI (Health and Psychosocial Instruments), CINAHL (Current Index to Nursing and Allied Health Literature), and Mental Measurements Yearbook. The initial PubMed search was developed (BF) using a combination of MeSH terms; title, abstract, and keywords were checked against a known set of studies. The search was then adapted to search other databases (HVV). Search strategies, dates, and results for each database were recorded in an Excel workbook and are found in ESM Appendix Table 1. The last database search was conducted on May 10, 2019.

We further manually reviewed bibliographies of articles selected for analysis and conducted a cited reference search using Scopus. Cited reference searches have been found to be a more sensitive search strategy than keyword searches for identifying articles using specific measurement instruments.27 Titles and abstracts of identified citations were initially reviewed by the first author (KSH) and selected citations underwent study selection procedures. EndNote was used to store all citations found in the search process and to check for duplicates. The last cited reference search was conducted on December 17, 2019.

Study Selection

Citations and abstracts were uploaded into DistillerSR (Evidence Partners, Ottawa, Canada) for study selection. Two reviewers at any one time (KSH, JC, AG, RK) independently screened all titles and abstracts. The two reviewers met to discuss and reach consensus on differences, consulting a third reviewer (EAM) as needed. Following, two reviewers (KSH, JC) conducted independent full-text screening of included articles. Differences were discussed and consensus reached, consulting a third reviewer (EAM) as needed. Screening results using a sample of references from a preliminary search yielded a weighted average kappa value of 0.73, indicating moderate interrater reliability.28

Data Extraction

A graphic representation of the methods used to perform data extraction, risk of bias, and summary of findings is shown in Figure 1. Two reviewers (KSH, JC) independently extracted data for each study using DistillerSR with differences resolved through discussion and consultation with a third reviewer (EAM) as needed. Study-relevant characteristics extracted included sample size, sample demographics, and setting. Measure-relevant characteristics included measure name, source reference, method of administration, composition (e.g., domains, subdomains, number of items), response options, and method of scoring. Study authors were contacted via email to identify missing information.

Figure 2
figure 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram of the search and screening process.

Evaluation of Methodological Quality of Measurement Properties for Individual Studies

After data extraction, two independent raters trained in psychometrics (KSH, JC) used the COSMIN Risk of Bias checklist to evaluate methodological quality of measurement properties in each study.24 The COSMIN Risk of Bias checklist was originally designed for patient-reported outcome measures (PROMs); minor modifications were made for our target population of professionals (ESM Appendix Table 2). The quality of each measurement property (i.e., structural validity, internal consistency, cross-cultural validity/measurement invariance, reliability, measurement error, criterion validity, hypothesis testing for construct validity, responsiveness) was rated as very good, adequate, doubtful, inadequate, or N/A. Ratings followed the “worst score counts” principle—the lowest rating of any standard for that property was taken as its overall rating. Differences were resolved through discussion and consultation with an expert psychometrician (GES).

Rating Results of Measurement Properties of Individual Studies

Following the rating of methodological quality of measurement properties of each study, two authors (KSH, JC) rated results of measurement properties in each study using Prinsen and Terwee’s criteria for good measurement properties.24 Results were rated as sufficient (+), insufficient (−), or indeterminate (?) (i.e., not enough information available).

The Prinsen and Terwee protocol was further modified to omit ratings of content validity, which is more time- and resource-intensive than that of other measurement properties. It requires strong knowledge of PROM development and qualitative methodology, expertise in the field of interest (organizational culture and climate), and experience with the target population of interest for reviewers to rate the content of measures directly. Rating content validity to this standard was deemed beyond the scope of the study.

Data Synthesis

Following their appraisal, studies were grouped by measure, with results pooled into a qualitative summary for each measure. This summary was again rated against Prinsen and Terwee’s quality criteria for good measurement properties to obtain a measure rating. Ratings were given as sufficient (+), insufficient (−), inconsistent (±) (i.e., no explanation found for inconsistent results), or indeterminate (?). Finally, to determine the quality of the pooled result rating, the evidence was graded using a modified Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach as high, moderate, low, or very low quality. Definitions of quality levels are in ESM Appendix Table 3. The modified GRADE approach is adapted from the standard GRADE approach;29 quality of evidence is initially assumed to be high, then downgraded based on four factors: (1) risk of bias of individual studies (i.e., COSMIN ratings of methodological quality), (2) inconsistency, (3) imprecision, and (4) indirectness.24 Rating and grading were done independently by two authors (KSH, JC). Differences were resolved through discussion and consultation with a psychometrics expert (GES) as needed.

RESULTS

Study Selection

The electronic database search yielded 1782 records, 42 of which were selected for analysis. The selection process is detailed in Figure 2.

Figure 1
figure 2

Data extraction, risk of bias, and summary of findings methods.

Across the 42 included studies, we identified 16 culture measures in 25 studies and 11 climate measures in 17 studies. After combining those that were variations of the same source measures (e.g., slight variations in wording or number of items; different domains or subscales), there were a total of 7 culture measures and 8 climate measures. Table 2 summarizes the measure characteristics. Measures varied in length from 15 to 50 items for organizational culture and 6 to 64 items for organizational climate. Number of domains ranged from 3 to 10 domains for culture and from 2 to 7 domains for climate. All but one measure were developed in English (Organizational Climate for Health Care Organizations,71 developed in Spanish). Most used Likert scales; however, some measures from the Competing Values Framework30 used an ipsative scale, where points are distributed among response options. Measure characteristics for individual studies can be found in ESM Appendix Table 4.

Table 2 Characteristics of Included Measures of Organizational Culture and Organizational Climate

Summary of Evidence

Table 3 shows the pooled qualitative summaries of measurement properties, their ratings, and an assigned GRADE for each measure. For ratings of methodological quality and results of measurement properties of individual studies, see ESM Appendix Table 5. Breakdown of GRADE assignments can be found in ESM Appendix Table 6. Overall, the most frequently reported measurement property was internal consistency, followed by structural validity. Cross-cultural validity was evaluated for two measures: the Medical Group Practice Culture Assessment (MGPCA)43 was translated into and adapted for the Italian language,48 and the revised Organizational Culture Profile52 was adapted for the Greek-Cypriot dialect.53 No psychometric properties were reported for seven culture measures and one climate measure. We describe measures that were cited by three or more studies, reported results for three or more measurement properties, have more than one measurement property with high-quality evidence, or were found to be originally developed for prim ary care (discovered through author correspondence or mentioned in the study).

Table 3 Qualitative Summaries, Ratings, and GRADE for Measurement Properties Pooled by Measure

Organizational Culture Measures

We summarize the findings for 3 of the 16 identified culture measures: Culture Questionnaire adapted for health care settings, Practice Culture Assessment, and revised Organizational Culture Profile. The most frequently used measure, the Culture Questionnaire adapted for health care settings,36 had insufficient internal consistency based on high-quality evidence. The Practice Culture Assessment, used by three studies, had insufficient structural validity based on low-quality evidence and sufficient internal consistency based on moderate-quality evidence. Reasons for low/moderate-quality evidence were indirectness of evidence due to the population being a mix of clinicians and staff and the risk of bias from a lack of very good-quality studies. The revised Organizational Culture Profile53 reported many measurement properties, albeit the majority having very low-quality of evidence due to lack of available studies of very good quality. Downgrading also occurred due to risk of bias—structural validity was performed with a small sample size, and cross-cultural validity testing lacked direct comparisons between two culturally different groups. Among all culture measures, the Practice Culture Assessment56 is the only measure that was found to be originally developed in primary care health professionals and used in its original form. The MGPCA and the Primary Care Organizational Questionnaire59 were also developed in primary care; however, the included studies use modified forms from the original measure.

Organizational Climate Measures

We summarize findings for 4 of the 11 identified climate measures: Nurse Practitioner Primary Care Organizational Culture Questionnaire (NP-PCOCQ), Task and Relational Climate Scale, Practice Climate Survey, and Workplace Climate Survey. The climate measure used by most studies, all of which were by the measure developer, was the field-tested version of the NP-PCOCQ.63 It also had the most reported measurement properties, with high-quality evidence for sufficient structural validity and internal consistency. The Task and Relational Climate Scale,7 derived from a larger survey administered by the Veterans Affairs, also had high-quality evidence for sufficient structural validity and internal consistency, although based on a single study. The NP-PCOCQ,63 Workplace Climate Survey,77 and Practice Climate Survey74 were found to be developed in primary care settings—NP-PCOCQ with NPs and Practice Climate Survey and Workplace Climate Survey with primary care providers and staff. The Practice Climate Survey had high-quality evidence for sufficient internal consistency, while the Workplace Climate Survey had no measurement properties of high-quality evidence.

DISCUSSION

In this systematic review, we identified and evaluated the psychometric properties of instruments used from 2008 to 2019 to measure organizational culture and organizational climate for primary health care professionals in primary care settings. Overall, we found considerable variability in measures, both conceptually, in that they differ in the domains and subdomains assessed and in psychometric quality. Only a handful of measures (6 of 27) were used in more than one study, and many studies reported limited or no psychometric information. Accordingly, we were unable to pool many psychometric results.

One explanation for the variability of measures is the lack of a consensus on what domains define these two constructs.9,20,80 Generally, the domains of management, relationship infrastructure, and trust seemed to appear in several culture measures, and leadership, teamwork, and autonomy in climate measures. Thus, while evaluating content validity against a “gold standard” remained outside of our review scope, this work serves as a reference point to better understand the breadth of available domain conceptualizations. Until such standards are developed, we suggest that when choosing an organizational culture or climate measure, in addition to considering the quality of evidence, one should consider if its domains are suitable for one’s study purpose.

Lacking robust high-quality evidence, we cannot confidently recommend any one measure for use in measuring organizational culture or climate in primary care. We highlight below some of the most promising measures based on the evidence available so far. In choosing these measures, we consider its frequency of use, evidence for robust measurement properties, and if it was originally developed in primary care.

Recommendations: Organizational Culture

The Culture Questionnaire adapted for health care settings36 is the most frequently used measure of organizational culture in this review. It uses an ipsative scale design, where the respondent distributes 100 points among four items, each representing a culture type. With ipsative scales, the items are interdependent, which may confound validity and reliability (e.g., internal consistency, factor analysis). However, this design allows culture to be interpreted as a composition of multiple cultures to varying degrees, which may better represent true culture. Additionally, its widespread use makes it easily comparable across studies.

The Practice Culture Assessment56 is a newer scale originally developed in primary care settings. Testing the measurement properties of this measure for clinicians separately from staff who do not provide patient care can increase the quality of evidence for its psychometric properties.

The MGPCA43 is worth considering as it is a frequently cited source measure by several of our included studies and was also developed for primary care.43 Because each of the included studies used modified versions of the MGPCA that all lack strong validity evidence, clear recommendations cannot be made about which specific version to use. The developers recommend their most up-to-date version, a 17-item measure with eight domains (not included in this review). Testing psychometric properties of this scale and its modified versions is warranted for future use.

Recommendations: Organizational Climate

Among organizational climate measures, the field-tested version of the NP-PCOCQ63 for nurse practitioners has the strongest psychometric evidence, with high-quality evidence for sufficient structural validity and internal consistency. Further strengthening of this tool necessitates validation by investigators independent of the scale developers.

The Practice Climate Survey 74 and the Task and Relational Climate Scale7 are more recently developed measures that have been developed in primary care. Additionally, they have at least one good measurement property based on strong evidence, although this is based on a single study for each. These measures would benefit from additional use and validation.

We caution users who may be applying these measures to their own work that, when aggregating individual results to the organizational level using means (as many of our included studies had done), variability is overlooked, resulting in information loss and bias.79 Thus, aggregated results should be interpreted accordingly.

Limitations

While COSMIN guidelines provide a strong framework for systematic evaluation of measures, there were two areas where adaptations to these guidelines were necessary in the current review. First, as COSMIN guidelines are tailored to patient-reported outcome measure evaluation, we adapted the scope for relevance to a target population of health care professionals (ESM Appendix Table 2). Second, evaluation of content validity was omitted, as there was insufficient consensus in the literature to provide a gold standard for evaluating conceptual domain scope. Consequently, the ratings assigned in this review should be considered conservative.

At a review level, we recognize that all studies may not have published all methods and results of the measurement properties for the measures they used. This could negatively impact the ratings of methodological quality that were assigned to them and, subsequently, our final recommendations. To mitigate this limitation, we attempted to contact authors by email to fill in missing information when possible.

Future Directions

Based on our review of organizational culture and climate measures, we offer suggestions to build upon this work. Foremost, investigators should consider drawing on or expanding upon existing tools, before developing new competing tools in this milieu. Further validation is particularly warranted when adapting existing tools to other diverse settings or populations within primary care, expanding into new subdomains, creating short forms from already validated measures, or recalibrating scoring procedures.

Additionally, clinicians should be examined separately from nonclinical staff when measuring organizational culture or climate. For the Culture Questionnaire adapted for health care settings, pooled internal consistency results improved markedly when practitioners only were examined, compared to when practitioners and staff were examined together (Table 3). The study of Becker et al.76 illustrates how clinicians and staff were given separate, structurally different versions of a climate instrument.

The field of organizational culture and climate is one that is constantly conceptually expanding,10 with no consensus by which these constructs should be measured.20 The dynamic nature of this field presents as a limitation to its objective measurement, as demonstrated in our review. The question remains on what the best way is to objectively measure organizational culture and climate. It may require a completely new measure with a more inclusive conceptual framework, or multiple measures to capture the diversity of the concept. To work toward filling these research gaps, one research team has proposed key dimensions to comprise organizational culture in primary care.6 Future work may build upon or confirm similar work and move toward designing standards for these constructs in primary care.

Conclusion

In conclusion, we present a systematic review on instruments that measure organizational culture and climate in primary care settings. A variety of measures were found with diverse and nonuniform dimensions. Overall, more high-quality evidence on their measurement properties in primary care is needed. The lack of a standard framework for culture and climate could be contributing to the difficulty in performing rigorous validity testing. Suggestions for further research include better measurement and reporting of the psychometric properties of existing instruments, exploring differences in culture and climate between practitioners and support staff, and supporting work toward standardizing dimensions for these constructs. We hope that compiling organizational culture and climate measures in a single review can help researchers make more informed decisions when choosing a measure or when deciding to develop a new one.