SNOMED CT Concept Hierarchies for Computable Clinical Phenotypes From Electronic Health Record Data: Comparison of Intensional Versus Extensional Value Sets

a


Introduction
Overview Given widespread adoption of electronic health records (EHRs) certified to follow terminology standards, why does achieving interoperable clinical phenotype definitions remain challenging? Practical approaches to analytic interoperability among EHR-originated datasets would provide value both for population health analytics and clinical research [1][2][3]. Clinical registries define most patient subpopulations-important clinical phenotypes-by either a shared condition or a shared exposure (eg, to a type of procedure or medication) [4]. EHRs now encode patient conditions in clinical terminologies mapped to SNOMED CT, an international comprehensive clinical terminology [5][6][7]. By federal standard, exchanging patient conditions (problems) between EHRs via health information exchanges employs SNOMED CT concepts.
Accordingly, clinical quality measures derived from EHR data increasingly define clinical phenotypes with SNOMED CT concept value sets, analogous to the International Classification of Diseases (ICD) code value sets traditionally defined for claims data. Initial SNOMED CT value sets primarily have taken an "extensional" form-that is, an enumerated list of terms-in keeping with the long-standing structure of ICD code value sets [8]. But SNOMED CT, being a polyhierarchical ontology, affords the powerful option of employing rule-based or "intensional" value sets leveraging the relationships within the ontology. Such intensional value sets can more concisely identify included and excluded subtypes of a clinical condition by referring to SNOMED CT's hierarchical "is a" supertype-subtype (parent-child) relationships. Those subtype relationships can be a close match to clinicians' thinking about clinical phenotypes and the subtypes of conditions they wish to be included or excluded. In a report on 125 such hierarchy-based value sets, we've shown they also are simple to create in an EHR and employ in an analytic data warehouse [9].
In the United States, the governmental Centers for Medicare and Medicaid Services (CMS) employs public quality measures to help assure the quality of health care for Medicare beneficiaries, primarily the elderly or disabled. In this study, we examined value sets defining 10 conditions referenced by 2018 Centers for Medicare and Medicaid Services (CMS) high-priority electronic clinical quality measures (eCQMs) for adults. We compare corresponding intensional versus extensional SNOMED CT value sets for their conciseness, time to construct, and completeness of SNOMED CT concept inclusion. We also compare their completeness in covering the SNOMED CT-mapped clinical terms selectable by clinicians within the EHR as patient Problem List entries and encounter diagnoses, since those selections ultimately drive clinical phenotypes for population health activities and pragmatic clinical studies employing EHR source data.

Why Value Sets?
Transactional source data from administrative and clinical information systems typically include diagnosis information encoded in either ICD or SNOMED CT. Value sets of specified ICD or SNOMED CT terms define conditions (clinical phenotypes) for use in clinical guidelines, clinical quality measures, and patient registries [4]. Two categories of conditions commonly need to be defined: (1) one or more primary, population-defining conditions and (2) comorbid conditions used for exclusions and/or risk stratification.

Why SNOMED CT Value Sets?
To be certified for the Meaningful Use program in the United States, EHRs must be able to transmit patient diagnosis information to another EHR using SNOMED CT-encoded concepts [10,11]. Thus, in most EHRs clinicians enter patient conditions onto their Problem List by using either SNOMED CT directly, or, more commonly, a clinician-friendly clinical terminology premapped to SNOMED CT concepts. Both methods enable preserving a higher level of clinical fidelity and relevant clinical detail than ICD does due to the enhanced clinical specificity of SNOMED CT [9].

Why Extensional SNOMED CT Value Sets?
Extensional value sets refer to simple lists of codes or concepts. ICD value sets traditionally have been constructed this way, in keeping with the structure of ICD [12,13]. As the need for SNOMED CT value sets arose, the same approach was continued. The innovative Value Set Authoring Tool made available in 2013 by the Value Set Authority Center (VSAC) initially supported only creation of extensional value sets [14].

Challenges with Extensional SNOMED CT Value Sets
Extensional value sets, as specifically enumerated lists, are brittle and prone to "break" or become stale with updates to the underlying terminology. SNOMED CT updates can include addition of new clinical concepts or refining an existing concept by creating or expanding its "descendant" concepts. Preexisting extensional value set lists cannot handle these automatically and may require frequent reupdating after new SNOMED CT version releases, followed by reimportation or copying into every EHR or other system employing the value set.
Some extensional value sets include many items, which inhibits rapid human comprehension of exactly which subtypes of a given clinical condition are being included and excluded. Thus clinical vetting of such value sets becomes laborious. Similarly, construction of the value set and performing quality assurance are correspondingly difficult and labor-intensive. Inaccuracies in value sets can significantly affect clinical quality measure calculations [15].

Intensional Value Sets of SNOMED CT Concept Hierarchies
SNOMED CT intensional value sets, by contrast, are rule-based and leverage the polyhierarchy structure of SNOMED CT. That is, one can include or exclude an entire "tree" of real-world condition subtypes via a single reference to a SNOMED CT concept and all its descendants. Combining such tree references with simple Boolean logic (or with SNOMED CT Expression Constraint Language) enables efficient definition of a desired clinical phenotype [9]. For instance, osteoporosis and all of its subtypes can be defined by reference to one SNOMED CT concept (SCT ID 64859006 Osteoporosis) and all its descendants. The corresponding extensional list would require 42 SNOMED CT concepts to fully define. In turn, in our EHR 2287 diagnosis clinical terms map to this single SNOMED CT concept hierarchy; a clinician selecting any one of these for a patient's diagnosis would automatically include them in the broad computable clinical phenotype of osteoporosis.
Possible benefits of SNOMED CT intensional value sets include closely matching how clinicians think about what condition subtypes to include or exclude from a given clinical phenotype. Being able to reference the entire tree of a concept's descendants enables far simpler, succinct value set definitions that are easier to understand and construct. Additionally, they should be more resilient to change and less likely to omit descendants and break with future SNOMED CT concept additions. Consequently, intensional value sets have potential to be simultaneously simpler and more complete and thus more useful for population health analytics and clinical research using EHR data.

Objective of the Study
For each of 10 conditions (clinical phenotypes), evaluate the differences between an intensional (concept hierarchy-based) versus extensional (list-based) SNOMED CT value set approach in (1) conciseness, (2) time to create, and (3) completeness of both SNOMED CT concepts included and relevant clinical terms available for clinician selection in an EHR.

Selection of Value Sets
Value sets included in this study were identified starting from the CMS website for choosing Merit-Based Incentive Payment System (MIPS) quality measures [16]. MIPS measures were filtered for high-priority measures and data submission method of EHR, yielding 21 candidate measures. Four measures covering the following 4 common adult conditions were selected: hypertension, diabetes mellitus, depression, and prostate cancer.
Next, the online VSAC "search value sets" feature was employed to find condition-defining SNOMED CT value sets for these measures [17]. Value sets were first filtered for CMS eCQM Release = "eCQM Update 2018 EP-EC and EH" and Code System = "SNOMEDCT." Then each of the eCQMs was selected individually, displaying the related SNOMED CT value sets. Any value sets specifying a condition (diagnosis) were included, yielding an initial total of 12 SNOMED CT extensional value sets (see Multimedia Appendix 1).

Software
Creation of EHR vendor-neutral SNOMED CT intensional value sets and automatic derivation of extensional value sets were both done using Symedical (Clinical Architecture LLC), a clinical terminology management and mapping software tool for health care professionals. SNOMED CT intensional value sets (groupers) for EHR-based registry and clinical decision support functionality were created using the grouper management features of our EHR, Epic (Epic Systems Corporation

Procedures
Using the VSAC website's "export value set results" feature, the list of codes for each SNOMED CT extensional value set was exported to Excel (Microsoft Corp) for subsequent comparison.
Comparable intensional (rule-based) value set diagnosis groupers for these conditions were established in our EHR. The majority already existed, having been created for disease registries and/or clinical decision support [1]; two were newly created for this study (pain related to prostate cancer, personality disorder). Identically matching intensional value sets were then constructed in Symedical (in addition to Epic), and the time to construct each intensional value set recorded. Intensional value sets were defined using a "search, drill-up, drill-down" approach previously described [9]. Existing and newly-defined intensional value sets were vetted by medical informaticians and clinicians by deriving the full list of included SNOMED CT concepts for review.
To enable meaningful direct comparison with intensional value sets, two combinations of VSAC value sets were performed prior to comparing the SNOMED CT concept lists: (1) chronic kidney disease, stage 5, (CKD-5) was combined with end-stage renal disease (ESRD) since clinically they refer to the same condition, and so only one intensional value set covered both, and (2) major depression including remission was combined with dysthymia, as together they constitute the condition of depressive disorders covered by a single intensional value set. This yielded a final set of 10 clinical conditions for comparison. The eCQMs, VSAC value set identifiers, and extensional value set contents are available in Excel format in Multimedia Appendix 1.
The pregnancy value set (2.16.840.1.113883.3.526.3.378) downloaded from VSAC was found to include concepts focused on pregnancy itself but in general did not include concepts for complications or disorders of pregnancy. Our existing EHR-based intensional value set for pregnancy deliberately included the latter to provide a broad net for identifying any pregnant patients via EHR-entered diagnoses and problems. Accordingly, to better match the VSAC contents, we constructed a second narrow intensional value set for pregnancy based on the pregnancy conditions listed in the VSAC extensional value set by deliberately omitting SNOMED CT concepts for pregnancy-related conditions (eg, complication occurring during pregnancy, disorder of pregnancy). The VSAC extensional pregnancy value set was compared separately with both the broad and the narrow intensional pregnancy value sets.
For each intensional value set, a corresponding extensional value set list was automatically derived using Symedical (ie, a list of all included SNOMED CT concept descendants). These derived extensional value sets were downloaded and stored for subsequent analysis. The intensional value sets and corresponding derived extensional value sets are available in Excel format in Multimedia Appendix 2.

Value Set Definition Conciseness
Conciseness of value set definition was measured simply by the number of SNOMED CT concepts needed to fully define the set, either as a list (extensional value set) or the number of concepts in the defining rule (intensional value set). A dimensionless ratio to define was calculated in two forms: Ratio to define (download) = (# concepts in VSAC-downloaded extensional value set) / (# concepts in intensional value set defining rule) Ratio to define (derived) = (# concepts in derived extensional value set) / (# concepts in intensional value set defining rule)

Time to Construct
The purpose of the "Time to Construct" measure is to gauge the time needed at each healthcare organization to construct in their local systems, such as their EHR, an approved value set definition received from a defining group such as VSAC (or a local clinical terminology committee). The preceding upfront "time to define" the value set, including iterative clinical review, is purposefully not included.
The time to construct in Symedical each of 11 intensional value sets (including both pregnancy value set versions) as well as 3 of the extensional value sets (CKD-5 & ESRD; prostate cancer; pain related to prostate cancer) was measured. From this a best-fit linear equation was derived: time (min) = 0.4177*(# SNOMED CT concepts) + 3.8707. This corresponds to an obligate time of just under 4 minutes to construct any value set (eg, for configuring basic common settings), plus approximately 0.42 minutes (25 seconds) to add each SNOMED CT concept. The time to construct the remaining extensional value sets was estimated using this equation.
The difference in time to construct an extensional versus an intensional value set was calculated as (time to construct extensional value set) -(time to construct intensional value set), expressed in minutes. The dimensionless ratio was calculated as (time to construct extensional value set) / (time to construct intensional value set).

Completeness: SNOMED CT Concepts
For each of 10 conditions, the list of SNOMED CT concepts included in the VSAC-downloaded set and the intensional-derived set were compared. The total number of concept discrepancies present in one set and not the other was assessed by summing two discrepancy types: • Number of concepts present in the VSAC-downloaded set but not in the intensional-derived set • Number of concepts present in the intensional-derived set but not in the VSAC-downloaded set Since virtually all of the SNOMED CT concepts in the downloaded extensional value sets were included in the corresponding intensional-derived value set, the ratio of the two was calculated as: (# concepts in intensional-derived set) / (# concepts in VSAC-downloaded set), expressed as a number greater than 1. The percentage of SNOMED CT concepts included in the downloaded extensional value set was calculated as: (# concepts in VSAC-downloaded set) / (# concepts in intensional-derived set), expressed as a percentage.

Completeness: Electronic Health Record Clinical Term Coverage
To evaluate the impact of condition-specific discrepancies, value sets were created in the EHR in both an intensional form (existing) and an extensional form (to exactly match the VSAC list of concepts, without including descendants). The EHR automatically creates a compiled list of IMO-sourced clinical terms mapped to the SNOMED CT value set. These IMO clinical terms comprise the diagnoses visible to clinicians for selection as Problem List entries and as diagnoses to associate with patient orders, encounters, and professional charges. The number of clinical terms compiled for each intensional and extensional value set was recorded. Comparisons were then performed on the number of clinical terms available for selection by clinicians in the EHR that would result in patient inclusion in a given clinical phenotype.
Just as for SNOMED CT concept completeness, the ratio of the two was calculated as (# clinical terms from intensional-derived set) / (# clinical terms from VSAC-downloaded set), expressed as a number greater than or equal to 1. The percentage of clinical terms covered by the downloaded extensional value set was calculated as (# clinical terms from VSAC-downloaded set) / (# clinical terms from intensional-derived set), expressed as a percentage.

Overall Format of Result Tables
Tabulated comparisons by each of the 10 conditions follow. Summary calculated measures are included at the bottom of each table. In addition to overall sums and ratios, the median of the 10 condition-specific values was selected as the primary measure of central tendency. This method was chosen a priori to avoid the potential for skew if one or more conditions exhibited marked difference from the others or contained many more concepts. The minimum, maximum, and range across the 10 conditions are also reported.
For pregnancy, both the narrow and broad definitions are shown in tabular form; however, only the more narrow intensional value set based on the CMS extensional value set was used in all summary calculations (to avoid double-counting). Use of the narrow pregnancy definition reduces the reported differences between intensional and extensional value sets so that the summary findings and conclusions shown are conservative. Were the broad pregnancy definition selected instead, the magnitude of effects would be larger. All tables are available in Excel format in Multimedia Appendix 3.

Value Set Conciseness
We expected that intensional value sets should be more concise to construct by leveraging the hierarchical supertype-subtype structure of SNOMED CT. Table 1 shows that the median number of SNOMED CT concepts employed to define a condition with the VSAC value sets was 21.5 concepts versus only 3.0 for intensional value sets. For the full extensional value sets derived from the intensional rules, the median number of concepts-to-define was 78.5 concepts. The median ratio of concepts needed to fully define an equivalent extensional value set was 25 times that needed for the intensional value set rule.
As one example, the clinical phenotype of personality disorder is specified by 26 SNOMED CT concepts in the downloaded extensional value set (Figure 1). In contrast, the corresponding intensional value set rule (inferring intent of subtypes desired from examining the VSAC downloaded list) includes just two concepts: (1) Personality disorder (SCT ID 33449004), including descendants, AND NOT (2) Organic personality disorder (SCT ID 36217008), including descendants. This rule includes all 26 SNOMED CT concepts in the VSAC extensional value set plus an additional 22 closely related concepts that reasonably belong, for 48 included concepts and a concept ratio-to-define of 48/2 or 24.0.

Time to Construct
Not surprisingly, more concise value sets are easier and faster to construct, perform quality assurance on, review, and update as needed. As shown in Table 1, it takes 6 to 8 times longer to construct an extensional value set completely equivalent in contents to an intensional value set (median 6.7, overall ratio 7.9). In this set, constructing intensional value sets (groupers) for all 10 conditions was accomplished in just 1 hour (60 minutes) of keyboard time, while constructing the equivalent extensional value sets required nearly 11 hours (650 minutes).
The median construction time for these 10 conditions was 5 minutes for an intensional value set and 37 minutes for an equivalent extensional value set.

Completeness: SNOMED CT Concepts
SNOMED CT is updated twice yearly [18] and an intensional rule-based approach presumably should be more resilient to updates by automatically including new descendants within an existing included hierarchy, for instance. Accordingly, we examined the relative completeness of downloaded extensional versus corresponding intensional value sets.  Table 2 compares the number of SNOMED CT concepts included in the full extensional list derived from the intensional rule versus the extensional 2018 list downloaded from VSAC. Across the 10 conditions, the full derived list included a median of 3.3 times as many SNOMED CT concepts as the corresponding downloaded list (range 1.1 to 19.4). In percentage terms, a median of only 35% of SNOMED CT concepts in the full derived extensional list were present in the corresponding downloaded extensional list (range 5% to 91%), as shown in Figure 2 (left panel).
The vast majority of discrepancies between the two sets of extensional lists (877/889, 98.7%) were present in the intensional-derived list only and missing from the VSAC-downloaded list ( Table 2); 1.3% (12/889) of concepts in the VSAC download were not in the intensional-derived list. Of these 12, 6 were kidney transplant procedural concepts rather than disorder or condition concepts and had no corresponding diagnosis clinical terms defined in the EHR's clinical terminology. The remaining 6 were judged clinically relevant omissions from the intensional-derived list.

Completeness: Coverage of Relevant Electronic Health Record Clinical Terms
Pragmatic clinical trials, registries, and other research projects that rely on EHR data for clinical phenotypes need the most accurate and complete value sets possible to define primary and comorbid conditions. We thus compared the number of EHR clinical terms (sourced from IMO, overall n>800,000) selectable by clinicians that are in extensional versus intensional value set compiled lists (see Table 3 and Figure 2, right panel).
In 9 of 10 conditions, the number of EHR clinical terms identified using the downloaded extensional value set was less than when using the corresponding intensional value set, in some cases dramatically so. In this subset of 10 conditions, a median 65% of the EHR diagnostic clinical terms selectable by clinicians in a commonly used EHR are included when using a published list-based extensional value set compared with using a corresponding concept hierarchy-based intensional value set. That is, a median of 35% of clinician-selectable diagnosis terms in the EHR for defining a clinical phenotype are missing when using a 2018 downloaded extensional value set.

Principal Results
For 10 conditions referenced by the CMS 2018 high-priority clinical quality measures, we compared extensional SNOMED CT lists of codes downloaded in the fall of 2018 from the VSAC with intensional (rule-based) value sets for the same conditions. Intensional value set definitions were far more concise (median number of concepts needed for equivalent value sets 3 vs 75), faster to construct (median 5 vs 37 minutes each), and more complete. VSAC-downloaded value sets were missing a median 65% of the SNOMED CT concepts included in the intensional rule-based value sets and 35% of the mapped diagnosis clinical terms selectable by clinicians within the EHR.
The conciseness of intensional value sets expedites construction in the EHR. This should also streamline vetting with busy clinical experts and harmonizing multiple value set specifications of the same real-world condition. Many systems will directly import large extensional value set files, mitigating the value set construction time/workload difference for those organizations. Still, someone must create the source value sets initially and periodically update them, and some customer organizations will have to enter them manually as well. For those, the large time reduction offered by intensional value set definitions remains an advantage. Because intensional value sets are rule-based and can include references to all descendants of a parent concept, they are more likely to include all relevant concepts than an enumerated list. That is, they are less likely to inadvertently omit descendant concepts and more likely to remain complete following future SNOMED CT updates.

Clinical Phenotyping for Clinical-Translational Studies Using Electronic Health Record Data
Pragmatic clinical trials, registries, and other clinical and translational research studies employing EHR data for computable clinical phenotypes (rather than manual abstraction) rely on having as accurate and complete value sets to define primary and comorbid conditions as feasible [19,20]. Concern typically arises about missing diagnosis data not yet entered in the EHR by clinicians on the patient's Problem List or as Encounter Diagnoses. While Problem List completeness in particular remains a subject of active inquiry and improvement efforts [21][22][23][24][25][26], this study raises a different concern for completeness of value set definitions when physicians and advanced practice providers have conscientiously recorded their patients' specific diagnoses in the EHR. In this subset of 10 conditions, a median 35% of the EHR diagnostic clinical terms selectable by clinicians in a commonly used EHR are missing when using a published extensional value set compared with using a simpler rule-based intensional value set. Patients for whom those missing EHR terms are selected by clinicians will fail to be included in the selected population with the clinical phenotype. Controlling for comorbid conditions in multivariable modeling will similarly be negatively impacted by missing clinical EHR terms. Defining clinical phenotypes more completely with rule-based intensional value sets leveraging SNOMED CT's hierarchical structure advances the feasibility and reliability of pragmatic clinical studies and learning health care system cycles conducted with EHR data produced during normal clinical care [27][28][29].

Analytic Interoperability for Population Health
With the expansion of clinically integrated networks and cross-institution specialty registries to provide and measure value-based care, definition of subpopulations of patients becomes crucial for risk assessment and tailored interventions [1,[30][31][32]. Many networks encompass a variety of EHRs. Since the designated interoperability language between EHRs for diagnoses (conditions) is SNOMED CT, employing SNOMED CT value sets enables EHR-agnostic consistent definition of subpopulations for registries, clinical decision support to promote best practices within the EHR, care gap closure, and quality measurement [9]. This provides analytic interoperability across disparate EHRs even if using clinical terminologies from different vendors. The populations that would most benefit from intervention may change over time, thus generating requests for new computable clinical phenotype definitions. The conciseness and clinical understandability of intensional value sets streamline rapid-cycle definition and vetting by specialists, as well as more facile and consistent implementation across a broad range of EHRs, population health tools, and clinical settings. These advantages make employing SNOMED CT concept rule-based intensional value sets a higher quality, better fit-for-purpose method for defining computable clinical phenotypes for population health than traditional extensional lists.

Authoring Practice Guidelines and Electronic Clinical Quality Measures for Streamlined Implementation
With the expansion in medical knowledge and appreciation of the complexities of achieving optimal care for subpopulations of patients with a wide variety of conditions, the number of clinical practice guidelines continues to grow [33,34]. Significant effort and expense (in terms of experts' time) goes into writing consensus guidelines and optimal practices for a condition. Achieving real-world practice change takes a long time and is often incomplete [35][36][37]. EHR-based clinical decision support has been shown to improve clinical process measures across multiple clinical domains [38][39][40][41][42][43][44][45][46][47][48].
Yet current guidelines can be difficult to implement as point-of-care clinical decision support to help "make the right thing the easy thing to do" for busy clinicians within their daily work tool, the EHR [49][50][51][52][53]. Non-value-added work can include: • Translating prose definition of conditions covered by guideline, conditions excluded, and comorbid conditions into value sets implementable in EHRs to cover clinical terms/codes present in EHRs in practice • Translating prose definitions of medication types and/or procedure types into EHR-implementable value sets • Translating prose descriptions (and some flow charts and decision trees, if constructed ambiguously) into implementable decision algorithms for clinical decision support logic [54] EHRs have local codes that can hamper implementation, but increasingly these are mapped to standard terminology codes to achieve interoperability with other EHRs as organizations participate in health information exchanges [27].
To accelerate implementation, we propose that specialty guideline and eCQM writing committees include a medical informaticist (as either a consultant or a formal member of the writing group representing a clinical informatics specialty society). During initial guideline development discussion and through subsequent detailed specification, the medical informaticist could then assist specialist experts on the committee in expressing the clinical conditions relevant to the guideline or eCQM in a SNOMED CT supertype-subtype form, readily implementable in an EHR or other internet-accessible repository as a concise, easily shareable intensional rule ( Figure  3).
Doing so would avoid the considerable extra work of constructing a de novo extensional value set, vetting the full list with clinical experts, distributing it, and having multiple teams of EHR analysts and clinical informaticists around the country independently reverse-engineer the list into a supertype-subtype rule-based form to gain its benefits of conciseness, maintainability, completeness, and understandability for their local EHR implementation. In lean terms, that extra work (red arrows in Figure 3) could be considered non-value-adding waste. In contrast, coproducing concise, shareable SNOMED CT intensional value sets contemporaneously with the guideline and/or eCQM specification would expedite practical dissemination of clinical decision support to promote the new best practice in a consistent subpopulation of patients across the country, matching the guideline writing specialists' intent.

Navigating the SNOMED CT Hierarchy and Selecting Concepts for an Intensional Value Set
Because of the polyhierarchical structure of SNOMED CT, potential exists for inadvertently including descendant branches and/or individual concepts which do not belong. The "search, drill-up, drill-down" approach employed mitigates that risk by explicitly exploring if the currently-selected concept in a SNOMED CT hierarchy browser is too general or too narrow [9]. A helpful additional mitigation strategy is to expand the intensional rule to show all included SNOMED CT concepts as a derived extensional list (we used Symedical for this purpose), then having a clinician view this list for any additional concepts which should be excluded. These then similarly can be evaluated with the "search, drill-up, drill-down" method to find the optimal concept in the hierarchy for exclusion along with its descendants.

Changes to SNOMED CT
Importantly, although intensional value sets retain accuracy and completeness across many updates to SNOMED CT's contents, they are not impervious to changes [55][56][57]. With intensional rules referencing SNOMED CT's hierarchical structure, additions of new descendants are generally automatically included. Some value sets may need clinical vetting for updates after new SNOMED CT releases, perhaps particularly when an intensional rule includes some, but not all, of a parent concept's children. To enhance rapid re-vetting when needed, automated detection of new SNOMED CT concept additions that are within the span of a given rule-based grouper would be useful. One question to explore further is whether a specific inclusion strategy (include these specific siblings) versus an exclusion strategy (include all the children of the parent except these specific children) proves more resilient (remains more complete and accurate).
Migrations of existing SNOMED CT concepts to a different location in the hierarchy due to cleanup of SNOMED CT quality issues [58] pose a different challenge, although in many cases an intensional value set will handle the correction gracefully [9]. As clinicians and medical informaticists work with intensional value sets to define important clinical phenotypes, iterative improvements in SNOMED CT's hierarchical arrangement will likely ensue, following the data quality aphorism "what gets used, gets better" [1].

Scope of This Paper's Analysis and Differences in Value Set Intent
One limitation of this paper is that the comparable intensional value sets were developed and vetted only at one institution (University of Texas Southwestern Medical Center) and cannot be considered to represent national specialty society views. However, our experience demonstrates feasibility for a medical informaticist to build an initial candidate rule for defining a condition, then identify any clinical inclusion/exclusion questions for vetting with a clinician specializing in the condition [9]. For multi-institutional and/or specialty society vetting, a Modified Delphi technique can be employed as was successfully used by Buchanan [59] previously to gain working consensus across institutions. Our vision is that increasingly intensional value sets are produced as a byproduct of clinical guideline and eCQM authoring, dramatically reducing the need for individual institutions to reinvent the wheel (Figure 3). Clinicians' thinking about type/subtype inclusion criteria matches SNOMED CT concept hierarchy implementation in electronic health records (EHRs). Deriving flat code lists (red arrows) requires reverse engineering by each EHR team to achieve the benefits of a concept hierarchy-based definition.
This study only covers 10 conditions and may not be representative of all and so should be considered merely as a deep dive into one set of conditions for CMS-designated eCQMs (as described in the Methods section). We took a conservative approach in matching intensional definitions to VSAC-downloaded extensional definitions, otherwise differences reported between intensional and extensional value set completeness would have been even greater. Specifically, for pregnancy we did not use our existing broad pregnancy intensional value set: instead we constructed a new, much more narrowly defined value set intended to match the scope of the VSAC-released pregnancy value set. Similarly, for pain related to prostate cancer, we lacked an existing intensional value set and constructed our new intensional value set closely mirroring the contents of the VSAC extensional value set. Both result in minimizing differences between the extensional and intensional approaches.
On the other hand, for hypertension our existing intensional value set includes all forms of hypertension (meant to represent the scope covered by recent hypertension guidelines [60][61][62]), whereas the VSAC-downloaded extensional value set was specific to essential (primary) hypertension. The latter did not include SNOMED CT concepts for the general concept of "Hypertensive disorder, systemic arterial (disorder)" not specified to be primary or secondary, for secondary hypertension, or for "Complication of systemic hypertensive disorder (disorder)". Replacing our existing hypertension intensional value set with one mirroring the contents of the VSAC-downloaded essential hypertension value set would have increased this condition's values for % completeness of both SNOMED CT concepts and EHR terms. However, our pre-specified use of medians instead of means (averages) results in no change in the overall median values reported of 35% completeness for SNOMED CT concepts and 65% completeness for EHR clinical terms. Inconsistencies in SNOMED CT polyhierarchy "is a" definitions may lead to inadvertent inclusion of unwanted descendants of a seemingly wholly-appropriate SNOMED CT concept. Use of the "search, drill-up, drill-down" method during intensional value set definition can reduce the likelihood of this, as can clinical review of the full list of included SNOMED CT concepts derived from the intensional definition [9]. As discovered, such unwanted descendants can be specifically excluded in the intensional rule. Also requests to update the "is a" relationship in SNOMED CT to a more specific parent(s) can be made through the SNOMED CT Content Request Service. Once the subsumption has been updated in SNOMED CT, the value set intensional rule typically can be further simplified.

Conclusions
Although extensional lists of codes have long been used for ICD-based value sets, the use of extensional lists of SNOMED CT codes is suboptimal and fails to leverage the capabilities and clinical relevance of ontological relationships within SNOMED CT. Compared with SNOMED CT extensional (list) value sets, intensional (rule-based) value sets are far simpler to create, maintain, understand, and vet with specialist clinicians. For the 10 conditions studied here from the 2018 CMS high-value eCQMs for the MIPS program, intensional value sets also proved substantially more complete than their corresponding extensional list versions: a median 35% of diagnosis terms selectable in the EHR by clinicians were missing when using a downloaded extensional value set, with risk of failing to identify patients with a given clinical phenotype despite physician-entered discrete diagnoses in the EHR.
Consequently, in the EHR era we believe defining conditions as computable clinical phenotypes preferentially should employ SNOMED CT concept hierarchy-based (intensional) value sets rather than extensional lists. By doing so, clinical guideline and eCQM authors can streamline broad EHR implementation of condition-specific decision support promoting guideline adherence and patient benefit.