Interpreting pathology test result values with comparators (< , >) in Electronic Health Records research: an OpenSAFELY short data report

Background Numeric results of pathology tests are sometimes returned as a range rather than a precise value, e.g. “<10”. In health data research, test result values above or below clinical threshold values are often used to categorise patients into groups; however comparators (<, > etc) are typically stored separately to the numeric values and often ignored, but may influence interpretation. Methods With the approval of NHS England we used routine clinical data from 24 million patients in OpenSAFELY to identify pathology tests with comparators commonly attached to result values. For each test we report: the proportion returned with comparators present, split by comparator type and geographic region; the specific numeric result values returned with comparators, and the associated reference limits. Results We identified 11 common test codes where at least one in four results had comparators. Three codes related to glomerular filtration rate (GFR) tests/calculations, with 31-45% of results returned with “≥” comparators. At least 90% of tests with numeric values 60 and 90 represented ranges (≥60 and ≥90 respectively) rather than exact values. The other tests - four blood tests (Nucleated red blood cell count, Plasma C reactive protein, Tissue transglutaminase immunoglobulin A, and Rheumatoid factor), two urine tests (albumin/microalbumin) and two faecal tests (calprotectin and quantitative faecal immunochemical test) - were returned with “≤” comparators (29-86%). Conclusions Comparators appear commonly in certain pathology tests in electronic health records. For most common affected tests, we expect there to be minimal implications for researchers for most use-cases. However, care should be taken around whether results falling exactly on clinical threshold values should be considered “normal” or “abnormal”. Results from GFR tests/calculations cannot reliably distinguish between mild kidney disease (stage G2, 60-<90) versus healthy kidney function (90+). More broadly, health data researchers using numeric test result values should consider the impact of comparators.


Methods
With the approval of NHS England we used routine clinical data from 24 million patients in OpenSAFELY to identify pathology tests with comparators commonly attached to result values.For each test we report: the proportion returned with comparators present, split by comparator type and geographic region; the specific numeric result values returned with comparators, and the associated reference limits.

Results
We identified 11 common test codes where at least one in four results had comparators.Three codes related to glomerular filtration rate (GFR) tests/calculations, with 31-45% of results returned with "≥"

Introduction
OpenSAFELY is a new electronic health records platform used for research into COVID-19 for patients registered with general practices in England.Pathology tests results recorded in general practice, such as blood or urine tests, can give us important information about patients' health, including various aspects of research related to COVID-19.These include: grading the severity of patients' pre-existing conditions (which may impact their risk from COVID-19) 1 ; monitoring changes in health condition (for example after contracting COVID-19); or assessing healthcare service provision (e.g.whether people with diabetes generally experienced worsening disease during the COVID-19 pandemic).
Researchers may apply clinical threshold values to classify patients' numeric result values into severity categories, for example chronic kidney disease 2 .However, for some tests, result values can be reported as a range, indicated by a "less than" or "greater than" comparator, e.g."<10".This may be due to the measurement limits of the machines used to analyse the sample, or to simply report a "normal" result by referring to a clinical threshold.For some tests this type of result is very common, but comparators are stored separately to the numeric result values in electronic health records, and often not taken into consideration when researchers extract test results.This means that values could potentially be grouped incorrectly, e.g."<10" would be taken as "10" and incorrectly grouped with values "≥10".Similarly, it may be impossible to determine how a given value relates to a clinical threshold e.g. if grouping patients as above or below a clinical threshold of 5, a value reported as "<10" could fall in either group.
We therefore set out to find the most affected tests, how often they have comparators and which direction.We also identify the numeric values most often associated with comparators and compare them with reference limits, in order to inform suitable threshold values for researchers.This report is intended to support all researchers and studies carried out in OpenSAFELY-TPP informing responses to the COVID-19 pandemic, and those working with similar data elsewhere.

Study design
Retrospective cohort study across 24 million patients registered with English general practices in OpenSAFELY-TPP.

Data source
All data were stored and analysed securely using the OpenSAFELY platform, https://www.opensafely.org/,as part of the NHS England OpenSAFELY COVID-19 service.The dataset analysed within OpenSAFELY-TPP is based on GP surgeries using TPP SystmOne software.Data include pseudonymised data such as coded diagnoses, medications and physiological parameters.No free text data are included.All code is shared openly for review and re-use under MIT open license (https://github.com/opensafely/pathology-comparators-short-report).Detailed pseudonymised patient data is potentially re-identifiable and therefore not shared.

Codelists
We conducted an initial analysis over a sample period of three days to identify a subset of common tests (≥50 records per day), and where ≥25% of the records had comparators.Eleven test codes met these criteria (Table 1); three of which were for Glomerular filtration rate (GFR) tests, four codes for other blood tests, two urine and two faecal tests.

Study population
We included patients of all ages registered with a GP in OpenSAFELY-TPP as of the start of each week over a two-week period beginning 1st October 2021, and having any test in the codelist.

Study measures
For each code in the codelist we counted the number of patients with a code recorded per week with an associated non-zero numeric result value.Zeros usually represent non-numeric results; for some tests zero values are possible but these cannot be distinguished.Limiting to non-zero values ensures we count those returned with a valid value only.For the latest result per patient per week, we extracted the numeric result value (available to one decimal place), and the comparator associated with the result.We also extracted the upper and lower reference bounds returned alongside the result; these represent the upper and lower limit for what is considered a "normal" result; however some tests only have one limit (e.g. a test for which any low value is "normal" and an "abnormal" result is only defined as being above a clinical threshold).Tests may have multiple possible upper and/or lower limits as they can vary by laboratory and patient factors such as age.Counts were rounded to the nearest 10.Percentages were calculated after rounding.

Overall rate of comparators present per test.
We grouped tests with no comparator, "=" or "~" together as the no-comparator group.We grouped tests with ">" or "≥" together as "≥", and similarly for "<" and "≤".We report the proportion of each test which were returned with comparators present.
Values associated with comparators.We identified the numeric values most commonly associated with comparators in order

Amendments from Version 1
Changes made in this version, in response to the reviewers' comments, are minor and comprise: • Abstract: A small clarification of mild kidney disease as "stage G2" • Methods: Removal of a duplicated sentence • Discussion: Added references to other issues which may affect interpretation of laboratory results in research, such as different methods, reference values and units of measurement.
• Discussion: Further caution around use of test results values as continuous variables where comparators may be involved.to ascertain the important cut-off points for users studying these results.For each test, we identified numeric values with a total count >100 and ≥0.1% returned with comparators present, limited to the main comparator in use for each test (≥ or ≤).We report the proportion of each test result value which were returned with comparators present and compare these values to the most common upper or lower reference bounds (as applicable) for the test.

Any further responses from the reviewers can be found at the end of the article
Regional variation.We report the proportion of each test which were returned with comparators present by the NHS region of the practice at which each patient was registered.Some regions do not have full population coverage in the cohort as only practices using TPP software are included.

Software and reproducibility
Data management and further analysis were performed using Python 3. Code for data management and analysis, as well as codelists, are archived online https://github.com/opensafely/pathology-comparators-short-report.

Patient and Public Involvement
OpenSAFELY has developed a publicly available website https://www.opensafely.org/through which they invite any patient or member of the public to make contact regarding the broader OpenSAFELY project.

Overall rate of comparators present per test
We included 461,430 tests over the two-week study period, 166,800 (36.1%) of which had results that were associated with a comparator (Table 2).GFR tests/calculations typically had ≥ comparators (31-45% of tests) but not ≤ comparators (0%), while all other included tests had the opposite.The greatest rate was in Nucleated red blood cell count with 86% of results returned with a ≤ comparator, followed by Quantitative faecal immunochemical test (72%) and Rheumatoid factor (71%).The remaining tests had 29-40% of test results with comparators (Table 2).

Values associated with comparators
For the included GFR tests, the most common results involving comparators were "≥60" and "≥90", where 90-96% of tests with numeric value 60 or 90 had a ≥ comparator (Table 3).
Values associated with comparators for the non-GFR tests are shown in Table 4. Nucleated RBC test results with values of 0.5 or 0.2 were almost always returned with  Columns show the number of test results which appeared with the comparator, the total tests with the same numeric value with or without comparators (e.g."60", "90"), and the calculated percentage of results with comparators.Counts are rounded to the nearest 10.The most commonly appearing "lower reference limits" are also shown; results above these values are considered "normal".These values are sometimes supplied alongside a test result to provide the appropriate reference limit for the patient (e.g. for their age group).The most common non-zero value is shown, alongside the percentage of tests to which they were attached.≤ comparators (99-100%).These corresponded to the common upper limit values (0.2 in 55% of tests and 0.5 in 31%).

Most common lower reference limit term
For Rheumatoid factor, various test result values between 7-13 and 20 were commonly associated with ≤ comparators, (11-100%; Table 4).This test typically had no lower reference limit (1% of tests had a lower limit of 10.0); the most common upper limits were 14.0 (46%) and 15.0 (10%).
Urine albumin had various values commonly associated with ≤ comparators, between 3.0-10.0(Table 4).This test typically had no lower reference limit (3% of tests had a lower limit of 3.0) or upper limit (79%); the most common upper limit present was 20.0 (11%).Urine microalbumin tests occurred less frequently but showed a broadly similar pattern, with values of 2.0-7.0 commonly associated with ≤ comparators.

Variation by region
There was a large degree of variation on the use of comparators between regions for most tests (Figure 1, Table 5), as well as the tests themselves, e.g.Glomerular filtration rate was only widely used in two of the nine regions (Table 5).Each region had comparators present for at least some tests, and some tests showed particularly wide variation.For example, nucleated red blood cell counts had high rates of comparator usage in two regions (e.g.88.3%, 11,970 of 13,560, East Midlands), but in the three other regions there were no comparators (however, denominators were much smaller (180-460).

Summary
We identified 11 common test codes in primary care data where at least one in four results were returned as a range of values, as indicated by a comparator such as ≥ or ≤, which may impact how these test results are interpreted in health data research.Three of the affected tests were related to GFR testing (reduced GFR indicates reduced kidney function), with results commonly returned as "≥60" or "≥90" rather than exact values.The other eight tests (four blood tests, two urine and two faecal tests) were returned with ≤ comparators.Reassuringly, the most common values returned with comparators were typically lower than the associated lower reference limit so unlikely to have an impact on interpretation of these result values.Regional variation in the proportion of tests returned with comparators indicates different conventions between different testing laboratories for some tests.

Strengths and weaknesses
Here we aimed to inform research using electronic health records to categorise patients' health status on a population level; our results are not intended to be applicable to the clinical interpretation of individual patient's test results, where the clinician would have richer contextual information at hand.We expect our results to be generalisable to primary care data held in other UK research platforms, as the results are returned from laboratories with comparators, where applicable.However, we only include practices using TPP software, which are geographically clustered, so it is possible that testing laboratories not covered by this analysis may have  different conventions.There may be other factors which influence the use of comparators e.g.patient age or comorbidities, where these affect the clinical threshold between normal and abnormal results.We only included a small subset of tests, but they were the most common tests in our sampling period with at least 25% of results returned with comparators.
We also only covered a short (two weeks) sample of test results.It is possible there could be seasonal effects on testing patterns or results at different times of the year 4,5 .Some additional information relating to test results may be supplied in free text which is not currently available in OpenSAFELY.

Findings in context
Test results in electronic health records should always be interpreted with caution as "abnormal" results alone do not necessarily indicate pathology, and other information e.g.symptoms may be important to determine the patient's condition, which may not be available to the researcher.In addition, other issues may affect interpretation, such as different methods and reference values used by different laboratories 6,7 , and different units of measurement 8 .This report adds further clarity on applying threshold values to classify patients' health status based upon test results in electronic health records research.
For the GFR-related tests, we found results commonly returned as "≥60" or "≥90".While 90+ typically represents healthy kidney function, values 60-<90 may represent mild kidney disease in combination with other signs or symptoms 9 ; therefore, patients with result "≥60" cannot be fully distinguished between these states.However, most commonly in health data research, and in OpenSAFELY research to date, categories are only created for moderate to severe disease (GFR <60), and these are unaffected by comparators.GFR values are also often recalculated from the raw creatinine values in electronic health records.However, some studies in the literature do make distinctions between the 60-<90 and 90+ groups 10 .
For the other eight tests we identified, no OpenSAFELY studies to date have used result values from these tests without also returning the operator, but there are some implications for interpretation of these result values.Taking quantitative faecal immunochemical tests as an example, values ≥10 are typically classified as abnormal results, in combination with other information 11 ; however, we found the value 10 appeared with a "<" or "≤" symbol 96% of the time, indicating a normal result.Therefore, contrary to convention, 10 should be included as a normal result for this test, when analysing results in health record data.

Implications
It is important to recognise when using laboratory test result values in electronic health records that a numeric value does not always represent an exact measurement result.One implication of the presence of comparators is that much care needs to be taken in using them as continuous variables.For example, the absolute change in values over time within an individual cannot always be determined, there can be clusters of results at the same or similar values, and calculating averages may not be meaningful.Users of primary care data should check the comparators returned with test result values to ensure they are interpreting them correctly, especially for the tests identified in this report and for less common tests which we have not included here.

Summary
Some test results values in electronic health records are commonly returned with comparators, indicating a range of possible values.For the common tests we identified, we expect there to be minimal implications for researchers for most use-cases.However, we found that for several GFR test codes, the result values cannot be accurately used to distinguish mild kidney disease from healthy kidney function.
In general, users of test result values in electronic health records should take care when using threshold values to classify a patient's condition and consider extracting any associated comparators alongside numeric values.

Information governance and ethical approval
NHS England is the data controller of the NHS England OpenSAFELY COVID-19 Service; TPP is the data processor; all study authors using OpenSAFELY have the approval of NHS England 12 .This implementation of OpenSAFELY is hosted within the TPP environment which is accredited to the ISO 27001 information security standard and is NHS IG Toolkit compliant 13 .
Patient data has been pseudonymised for analysis and linkage  16 .
In some cases of data sharing, the common law duty of confidence is met using, for example, patient consent or support from the Health Research Authority Confidentiality Advisory Group 17 .
Taken together, these provide the legal bases to link patient datasets using the service.GP practices, which provide access to the primary care data, are required to share relevant health information to support the public health response to the pandemic, and have been informed of how the service operates.
This study was approved by the Health Research Authority (REC reference 20/LO/0651) and by the LSHTM Ethics Board (reference 21863).

Data availability Source data
Access to the underlying identifiable and potentially re-identifiable pseudonymised electronic health record data is tightly governed by various legislative and regulatory frameworks, and restricted by best practice.The data in the NHS England OpenSAFELY COVID-19 service is drawn from General Practice data across England where TPP is the data processor.
TPP developers initiate an automated process to create pseudonymised records in the core OpenSAFELY database, which are copies of key structured data tables in the identifiable records.These pseudonymised records are linked onto key external data resources that have also been pseudonymised via SHA-512 one-way hashing of NHS numbers using a shared salt.University of Oxford, Bennett Institute for Applied Data Science developers and PIs, who hold contracts with NHS England, have access to the OpenSAFELY pseudonymised data tables to develop the OpenSAFELY tools.
These tools in turn enable researchers with OpenSAFELY data access agreements to write and execute code for data management and data analysis without direct access to the underlying raw pseudonymised patient data, and to review the outputs of this code.All code for the full data management pipeline -from raw data to completed results for this analysis -and for the OpenSAFELY platform as a whole is available for review at github.com/OpenSAFELY.
The data management and analysis code for this paper was led by HJC.Contact helen.curtis@phc.ox.ac.uk.
Clear and transparent methods, including links to github code, are provided.The authors acknowledge the main limitation, which is that they only looked at a small subset of tests (with at least 25% of results returned with comparators).
In the discussion section 'findings in context' the authors describe the potential implications of these findings for researchers seeking to dichotomise continuous test results from OpenSAFELY into binary (normal/abnormal) results.This is an important issue, particularly for FIT tests as they have highlighted.
However other issues such as the variation between different laboratories in cutoffs for 'abnormal' results, and issues with different units of measurement (eg for HbA1c and Hb) are probably a more prevalent problem for researchers when trying to categorise test results in this way.
I would also like the authors to consider the implications for researchers seeking to use continuous test results in analyses -for example summary measures of test results such as means, will be misleading if test results which include a comparator are treated as numeric results.This can usually be identified by exploring continuous results using a histogram to identify any unexpected peaks which might occur in association with a comparator.
I would see this contextualised as part of a wider issue in EHR research, the importance of carefully exploring and understanding your data prior to analysis.I thoroughly support the OpenSAFELY team approach of standardising code to identify recurring issues such as this, and ensure transparency and rigor in EHR research.

Is the work clearly and accurately presented and does it cite the current literature?
Yes

Is the study design appropriate and is the work technically sound? Yes
Are sufficient details of methods and analysis provided to allow replication by others?Yes If applicable, is the statistical analysis and its interpretation appropriate?Yes Are all the source data underlying the results available to ensure full reproducibility?Yes

Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: General practitioner with expertise in diagnostic testing and electronic health records research.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.Major Issues -In the conclusions section of the abstract referring to GFR 60-90 as "mild kidney disease" is too vague; the word "mild" also appears in association with a GFR of 45-59 (see https://ukkidney.org/health-professionals/information-resources/uk-eckd-guide/ckd-stages).Perhaps change to "stage G2 CKD" Minor Issues -might it be worth making the point specifically about the need to check for the presence of comparator values if using the tests identified as continuous variables in eg.prediction models?
Typos/formatting -The final sentence of Methods>Data Source is duplicated

Is the work clearly and accurately presented and does it cite the current literature? Yes
Is the study design appropriate and is the work technically sound?Yes

Are sufficient details of methods and analysis provided to allow replication by others? Yes
If applicable, is the statistical analysis and its interpretation appropriate?Yes Are all the source data underlying the results available to ensure full reproducibility?Yes

Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: I'm an NHS consultant nephrologist with a PhD in genomics and some experience in large clinical data projects I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 1 .
Figure 1.Comparator rates for each test by NHS region.Scale indicates the proportion of each test returned with a comparator (≥, >, ≤ or <) during the study period (1-14 October 2021.White squares represent denominators <100 tests.Only tests with non-zero result values are included.Values are shown in Table5.Some regions do not have full population coverage in the cohort due to mixed practice software use: in particular London, the West Midlands and South East have less than 20% coverage in OpenSAFELY-TPP; North East and North West are less than 50% 3 .

Reviewer Report 31
January 2024 https://doi.org/10.21956/wellcomeopenres.21991.r70395© 2024 Oates T. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Tom Oates 1 Barts Health NHS Trust, London, UK 2 Barts Health NHS Trust, London, UK Summary In this study the authors use the OpenSAFELY platform to retrospectively examine the EHR of 24m patients to find results returned with a comparator.11 test codes showed >=25% of records with comparators.These tests encompassed blood, urine and faecal tests.GFR tests were returned with >= comparators and the remaining tests with =< comparators.Numeric data around how often these test results are returned with comparators and geographic variation in results is reported.

Table 1 . Tests included by code (SNOMED code) and term (SNOMED description). Codes are
grouped into four types and the common use of the test is shown.

Table 2 . Count and rate of comparators per test over the two-week study period. Counts
are rounded to the nearest 10.
(E)GFR = (Estimated) Glomerular filtration rate CKDEC = Chronic Kidney Disease Epidemiology Collaboration equation per 1.73 square metres MDRD = Modification of Diet in Renal Disease Study Group calculation

Table 3 . Most common comparator results for each included GFR test
. For each test, rows are divided into each of the commonly occurring "comparator" results e.g."≥60", "≥90".

Table 4 . Most common comparator results for each included non-GFR test. For each
test, rows are divided into each of the commonly occurring "comparator" results e.g."≤0.2", "≤0.5".Columns show the number of test results which appeared with the comparator, the total tests with the same numeric value with or without comparators (e.g."0.2", "0.5"), and the calculated percentage of results with comparators.Counts are rounded to the nearest 10.The most commonly appearing "upper reference limits" are also shown; results above these values are considered "normal".These values are sometimes supplied alongside a test result to provide an appropriate reference limit for the patient (e.g. for their age group).The two most common non-zero values are shown, alongside the percentage of results to which they were attached.Counts are rounded to the nearest 10.

Table 5 .
Some regions do not have full population coverage in the cohort due to mixed practice software use: in particular London, the West Midlands and South East have less than 20% coverage in OpenSAFELY-TPP; North East and North West are less than 50% 3 .

Table 5 . Comparator rates for each test by region. Results show
the number of test results returned with a comparator, the total number of tests and the corresponding percentage.Counts are rounded to the nearest 10.Regions are excluded for individual tests where fewer than 100 total tests were recorded over the two week study period.Only tests with non-zero results are counted.Some regions do not have full population coverage in the cohort due to mixed practice software use: in particular London, the West Midlands and South East have less than 20% coverage in OpenSAFELY-TPP; North East and North West are less than 50% 3 .