FHIR-standardized data collection on the clinical rehabilitation pathway of trans-femoral amputation patients

Lower limb amputation is a medical intervention which causes motor disability and may compromise quality of life. Several factors determine patients’ health outcomes, including an appropriate prosthetic provision and an effective rehabilitation program, necessitating a thorough quantitative observation through different data sources. In this context, the role of interoperability becomes essential, facilitating the reuse of real-world data through the provision of structured and easily accessible databases. This study introduces a comprehensive 10-year dataset encompassing clinical features, mobility measurements, and prosthetic knees of 1006 trans-femoral amputees during 1962 hospital stays for rehabilitation. The dataset is made available in both comma-separated values (CSV) format and HL7 Fast Healthcare Interoperability Resources (FHIR)-based representation, ensuring broad utility and compatibility for researchers and healthcare practitioners. This initiative contributes to advancing community understanding of post-amputation rehabilitation and underscores the significance of interoperability in promoting seamless data sharing for meaningful insights into healthcare outcomes.


Background & Summary
Major lower limb amputation (i.e., amputation at or proximal to the ankle joint) is a condition affecting about 25 new cases per 100,000 persons each year in European countries 1,2 .It may be the result of different causes, including trauma, vascular diseases, malignancies, and infections 1,3,4 .The amputation invariably affects other physical health aspects of the amputee, produces consequences on their psychological sphere 5 , and causes profound changes in their personal, occupational, and social life 6,7 .People with lower limb amputation commonly exhibit reduced quality of life 8,9 , possibly suffer from pain 10 , residual limb ulcers, gait impairment 11 , and increased risk of falling 12,13 .Anxiety and depression, but also positive psychological transformations, may follow this anatomical loss and life reorganization 5 .An appropriate prosthetic provision and an effective gait and balance rehabilitation program are fundamental for preserving the quality of life 14,15 .Other factors affecting health outcomes and quality of life include age, level of amputation, medication intake, and comorbidities.
While some studies have gathered data on people with amputations and associated clinical details 16,17 , very few datasets have been made publicly available.In 2020, Hood and colleagues have published kinetic and kinematic data from 18 trans-femoral amputees 18 .Furthermore, an early version of the dataset presented here was made available 19 as part of a study investigating safety of prosthetic knees against falls 20 .A critical need exists to establish a structured database that ensures interoperability and consolidates diverse sources of information with systematic methodologies.Such a database would serve as a potent resource for studying the role of various factors influencing rehabilitation outcomes, prosthetic knee selection, and adverse events.
The standardization of datasets has garnered significant attention in recent years 21,22 , as the adoption of standardized data formats facilitates the sharing and reuse of health data 23 .Within the realm of healthcare data, the Health Level 7 (HL7) organization has developed the Fast Healthcare Interoperability Resources (FHIR) standard 24 , enabling efficient healthcare data exchange with adaptability.Its broad applications, from clinical trials to hospital information systems and public health data streams, significantly contribute to streamlining communications and improving patient care processes 25 .It supports continuity of care at all health system levels, regardless of the software used 26 .The adoption of FHIR as a standard for healthcare data exchange bears several notable advantages in the context of the secondary reuse of real-world data for medical data science, such as cost-effectiveness, increasing quality, and high flexibility of the analysis 27 .
To this end, we introduce the MOTU dataset, a comprehensive dataset of structured data related to rehabilitation hospital stays of patients who have undergone trans-femoral amputation.Data are provided in comma-separated values (CSV) format and in human/machine-readable format employing the FHIR data standard.This dataset is designed to provide valuable insights into the rehabilitation process, contributing to a better understanding of factors influencing patient outcomes and fostering advancements in care for this specific patient population.
This work has been conducted under MOTU and MOTU++, two research projects on trans-femoral amputees and prosthetic devices funded by the Italian National Institute for Insurance against Accidents at Work (INAIL).

Methods
Participants.This dataset is the result of a retrospective, observational study conducted at INAIL Prosthesis Centre in Budrio, Italy 20,28 .The INAIL Prosthesis Centre integrates a rehabilitation hospital, a research center, and orthopedic laboratories, assisting every year more than 1500 individuals with amputations resulting from occupational injuries and various other underlying causes.
We included all hospital stays for rehabilitation training of individuals with unilateral trans-femoral amputation or knee-disarticulation, aged 18 years or more, during the period January 2011-May 2020, with signed informed consent by the patient for data treatment for research purposes.We excluded 338 hospital stays because the signed informed consent was either not found (134) or refused (204), representing 14.7% of all the eligible hospital stays.
The study was approved by the Ethics Committee "Area Vasta Emilia Centro" (ref.MOTU 18088, CE AVEC n. 380/2018/OSS/AUSLBO) and complies with the Declaration of Helsinki.
We included 1962 hospital stays of 1006 individuals.Men accounted for 90.9% of all hospitalizations (1784 hospital stays relating to 874 patients); the age range spanned from 18 to 91 years (mean 58, SD 14.4 years).About half of the hospital stays (49.6%, 973) were related to individuals with amputation due to trauma.Slightly more than 20% (21.4%, 420) were hospital stays for rehabilitation training after the first prosthetic provision.The over-representation of men in this cohort is due to two factors: firstly, major lower limb amputations occur more frequently on men than in women 4,29 ; secondly, this gender imbalance is even more pronounced for work-related amputations, which in our dataset account for 74% of the total hospital stays 30 .

Data collection.
We created the CSV dataset by merging information from patients' rehabilitation pathway and some external sources.The rehabilitation pathway is schematically shown in Fig. 1.Hospitalizations for rehabilitation training were either for first prosthetic fitting or prosthesis renewal.The former consists of those hospital stays in which the patient received the first prosthetic provision, used during the whole hospitalization at the Centre.The latter represents hospital stays of those patients who returned to the Centre for further rehabilitation after the substitution or significant revision of the prosthesis or its main components (namely the socket, the knee, or the foot).
Three main reference time points can be identified within each hospital stay: • T0.It represents the patient's admission to the Centre.At this time, a comprehensive assessment of the patient's clinical and functional status is conducted to ascertain baseline conditions and delineate a personalized rehabilitation pathway; • T1.It denotes the exit from the parallel bars during the hospital stays for first prosthetic fitting, and it corresponds to T0 in hospital stays for prosthesis renewal; • T2.It is the discharge, the moment when the patient ends his/her hospitalization.
The MOTU dataset covers the following areas (Fig. 2): • Clinical evaluations.They consist of the assessment of anthropometric measures, the reason for the current hospitalization, the patient's medical history, information regarding the amputation (i.e., amputation date, side, cause, and the residual limb length), a pain evaluation, and falls occurring during the hospital stay.A fall was defined as a "sudden, unintentional, and unexpected descent from upright, seated, or clinostatic position" 31 .Each fall was registered following the Italian Ministry of Health's recommendation on fall prevention and management in healthcare settings 32 ; • Administrative information.It reports information about the third-party payer of the hospitalization: whether the INAIL institute (for work-related injuries), the national health system, or the patient him/herself; • Functional tests.The patients were assessed for their functional abilities with the following tests: • the 10-m Timed Walking Test (TWT) 33 was executed at T1 and T2.Each TWT was executed twice, at comfortable gait speed, over a 14-m clear path with four marks at 0, 2, 12, and 14 meters.A physiotherapist recorded the time and the number of steps taken between the two intermediate marks.
• The Amputee Mobility Predictor (AMP) 34 was executed at T0 and T2 since 2016.It measures the functional mobility of a person with lower-limb amputation, including gait and several tasks related to static and dynamic equilibrium.It was administered without wearing the prosthesis (AMPnoPRO) on amputees at T0 for their first prosthetic fitting and while using the prosthesis (AMPPRO) in all other cases; • Questionnaires.Patients were asked to respond to the following questionnaires: • Barthel index 35 , assessing the independence level in activity of daily living.It was administered at T0, during the hospital stay, and at T2; • Morse scale 36 , used to determine inpatients' fall risk.It was administered at T0 and T2 until 2017; Fig. 2 Operational pipeline.Different data sources related to different areas (clinical evaluations, administrative information, functional tests, questionnaires, prosthetic knee, and drug prescription) were merged and anonymised to create the CSV dataset.We then mapped the variable of the CSV dataset into corresponding FHIR resources, identified appropriate coding systems, and validated the FHIR resources (standardization pipeline), thus creating the FHIR MOTU dataset.
• Emilia-Romagna Region (ERR) Survey for multifactorial risk assessment for falls in the hospital, developed by the Emilia-Romagna Region in the "Falls prevention in older people" plan 37 .It was administered in T0 and T2 since 2017, substituting the Morse scale; • Locomotor Capability Index with 5-level ordinal scale (LCI-5) 38 , specifically designed and validated on persons with lower-limb amputation, assesses the patient's perceived ability to carry out 14 locomotor activities of daily living while wearing a prosthesis.It was administered at T1 and T2.It has been substituted by AMP in 2016 because it exhibited a ceiling effect in patients with high functional abilities and did not distinguish among different types of walking aid used.
• Prosthetic knee.We collected information about the prosthetic knee used by each patient at each hospital stay.We further recorded in the MOTU dataset some characteristics of the prosthetic knees as reported in the manufacturers' websites.Based on these characteristics, we also categorized the prosthetic knees into four groups: (i) prosthetic knees used in locked configuration during walking (LK); (ii) articulating mechanical knees without fluid control (AMK); (iii) non-electronic, fluid-controlled knees (FK); and (iv) microprocessor-controlled knees (MPK).• Drug prescription.We collected information on all the drugs administered to each patient during the hospital stays.We mapped the Italian drug trade names to the related Anatomical Therapeutic Chemical (ATC) code 39 according to their main active ingredient.This mapping was supported by tables made available by the Italian Medicines Agency (AIFA 40 ) and by manual search over DrugBank 41 .
anonymization.We generated an anonymous ID for each patient in the dataset.We excluded from the dataset any variable containing name, surname, fiscal code of the patients or with text in natural language.We further shifted each patient's dates by a random number of days between −90 and +90 to make deidentification stronger while allowing data analyses on secular trends.
FHIR standardization.Two experts proficient in HL7 FHIR R4 independently annotated the variables of the CSV dataset into corresponding FHIR resources (Fig. 2).This phase was followed by a discussion on resolving discrepancies in the mapping process, leading to establishing an agreed resource mapping.Once all variables were successfully mapped, per HL7 recommendations, we identified appropriate coding systems for some of the MOTU dataset variables.We adopted a modular template approach using MatchBox presented in a previous study 42 and particularly regarding on questionnaires we designed a template to aggregate all the item scores from each scale.Finally, we defined customized Search Parameters and used those outlined in the FHIR specification to map data related to counting instances or detecting the presence or absence of specific variables.These resources are defined in terms of FHIRPath expressions and, upon integration into the FHIR server, can be leveraged within the FHIR Search application programming interface (API).

Data Records
The CSV 43 and FHIRed 44 datasets are available for access and utilization at the Zenodo Repository.

CSV Dataset organization.
The CSV dataset consists of five different tables, and its overall structure is presented in Table 1.
• Patient.This table relates the 1006 anonymous ID to the patient's birth date and sex.
• HospitalStay.This table contains information about the hospital stays for rehabilitation training, including clinical evaluations, administrative information, and outcomes from functional tests and questionnaires, whose detailed information are provided in Table 2.Each row of this table represents a different hospital stay, identified by the unique combination of the anonymous patient ID and admission date, counting for a total number of 1962 entries.• ProstheticKnee.This table provides technical features for about 40 distinct prosthetic knee models employed by the patients during their hospital stays.Technical features include name of the manufacturer, possibility to manually lock the knee, polycentric design, hydraulic or pneumatic control, electronic (microprocessor) control, knee category (i.e., AMK, FK, LK, or MPK), weight of the device, maximum weight allowed for the patient, patient activity level, and link to the manufacturer's webpage.• Fall.This table presents 146 entries on information about falls experienced by the patients during their hospital stays: date, whether the patient was wearing a prosthetic knee or not, reported injuries, whether it was a near fall 45 , or the activity carried out at the moment of falling.• Drug.This table lists 3032 entries about all the drugs administered to the patients during their hospital stays.
Each hospital stay is identified by the patient anonymous ID and the hospital admission date.Each drug is identified by its trade name in Italy and is associated to the ATC code of its main active i n g r e d i ent.

Standardization in HL7-FHIR.
From the initial set of 157 variables characterizing the CSV dataset, we successfully determined the correspondence with FHIR for 155 variables (98.7% of the dataset).We mapped 143 variables (91.1%) into FHIR resources and 12 variables (1.3%) derivable from FHIR Search Query as shown in Fig. 3.We failed to map two variables (3%) of the overall dataset: HFall and StumpLength in the Hospital Stay table, because it was not possible to identify any ontology associated with any FHIR R4 resource that describes the concept expressed by these variables.The list of variables and the related FHIR mapping is depicted in Table 1.
The overall mapping procedure generated 18 distinct resources and 12 FHIR Search Queries. Figure 4 illustrates

Retrospective MOTU dataset
Retrospective MOTU dataset on FHIR  46 , LOINC 47 , the Anatomical Therapeutic Chemical (ATC) classification system 48 , the Unified Code of Measure (UCUM) 49 , the International Classification of Disease 10 th edition (ICD-10) 50 , and the NCI Thesaurus (NCIt) 51 .Where a direct linkage with the dictionaries mentioned above was absent, custom code systems were introduced for comprehensive coverage.A total of 2 CodeSystems were generated to represent the following concepts: 1. Motu-encounter-id.This code system delineates the identifier linked to each Encounter resource.Its values denote the unique combination of anonymous patient ID and admission date that identifies each hospital stay.2. Motu-prosthetic-knee-properties.This code system outlines the technical features of the prosthetic knees (e.g., ManualLock, Polycentric, MPK, etc.) represented through DeviceDefinition resources.
technical Validation Data check.We checked the distributions of continuous and categorical variables with histograms and tables.
We further performed checks on variables representing dates, ensuring consistency between admission and discharge dates and all the dates of tests or questionnaires administered during the hospital stays.Any data inconsistency was solved by confronting paper-based and electronic medical records and applying clinical reasoning.We deleted values of any inconsistency that could not be solved.We performed accuracy and completeness checks on the data collected by students or research assistants.

FHIR validation.
The FHIR validation process encompassed syntactical and semantic validation to ensure the integrity and adherence of the dataset to the FHIR standard.Syntactical validation, performed by Matchbox, rigorously examines the structural correctness of the generated FHIR resources.At the same time, semantic validation, implemented by a dedicated module integrated into the HAPI FHIR server, assesses the resources' conformity to FHIR profiles.Leveraging the Structure Definitions, which describe the data schema, both levels of validation ensure the compliance of the produced FHIR resources with the specification, realizing a reliable and accurate method for clinical data standardization.A dataset on the persons with lower limb amputation can be found in the publication of Hood and colleagues 18 .This dataset consists of full-body biomechanics data acquired with a motion capture system and demographic and clinical data.Despite the innovative impact of such a dataset, it collects information on a small group of patients (18) and does not provide insights on clinical aspects, adverse events, or technical features on prosthetic devices.To the best of our knowledge, there is no public dataset on people with lower-limb amputation followed in their clinical pathway.The present dataset can be exploited for a variety of purposes, such as improving the personalization of prosthetic choice based on several personal and health-related characteristics or better understanding possible relationships between falls and prosthetic knees.Dataset enquiry.We applied two distinct methodologies to query the dataset, utilizing both the CSV and FHIR formats.This approach aimed to validate the consistency of information resulting from the conversion between CSV format and the FHIR resource representation by comparing the number of instances obtained by the two data sources.In querying the CSV format, we utilized Python programming language.On the other hand, for the FHIR resource, we converted the dataset into a Resource Description Framework (RDF) graph, employing SPARQL as the default querying language for data represented in RDF 52 .In selecting queries, we leverage the expertise of two co-authors (P.R., A.D.), who are clinicians associated with the INAIL Centre.They identified five different queries recognized as clinically meaningful.Table 3 provides a concise summary of the queries and their corresponding results.

Usage Notes
For the FHIRed dataset, interested parties and researchers can download an NDJSON formatted version and import it into an FHIR Server through the Bulk Data API.The repository's documentation section provides more detailed instructions on utilizing the dataset.The MOTU dataset is valuable for clinicians and researchers focusing on the rehabilitation pathway for lower limb amputation.Its richness encompasses a broad spectrum of information, including clinical, administrative, drug-related, prosthetic knee details, and functional mobility test assessments, among other parameters.It enables a comprehensive analysis of this particular patient group.Moreover, since it is machine-readable, automatic processing pipelines can be enabled by expert engineers employing FHIR data standard.
Although there are no official statistics, we expect that in Italy, during the years 2011-2020, the vast majority of people with trans-femoral amputation for work-related injuries chose the INAIL Prosthesis Center for rehabilitation training, as it was the only one offering this service.Contrariwise, only a minor fraction of non-work-related trans-femoral amputees have undergone rehabilitation at this center.Since work-related amputees are over-represented, the MOTU dataset as such cannot be considered representative of the whole Italian population of trans-femoral amputees.However, separate analyses on these two subpopulations can be done using the information on the third-party payer (CSV dataset: table HospitalStay, variable ThirdPayer; FHIRed dataset: field Account.Coverage (pointing to the Coverage resource)).Hospital stays for work-related  amputations can be unambiguously identified as those subsidized by INAIL, while for the others the payer is annotated to be either the Local Health Service (ASL) or private (Table 1).The time from amputation to receipt of the first prosthesis, which is pivotal to address numerous clinical questions, can be derived from the amputation date (CSV dataset: table HospitalStay, variable AmputationDate; FHIRed dataset: field Procedure.occurrence_x_)and the admission date (CSV dataset: table HospitalStay, variable AdmissionDate; FHIRed dataset: field Encounter.dtartDate)for first prosthetic fitting (CSV dataset: table HospitalStay, variable FirstdeliveryRenewal="FirstDeliv"; FHIRed dataset: those instances with empty Encounter.hospitalization.reAdmissionfield).
The number of comorbidities is available in tableHospitalStay, variable NComorbidities.It was estimated from the number of fields filled with pathological annotations in the electronic health record section dedicated to the physical examination 20 .Other direct or indirect information about the medical comorbidities of the patients can be found in the Morse Scale (CSV dataset:

Fig. 3
Fig. 3 Percentage of standardization into HL7_FHIR of the different original CSV dataset.

Fig. 4
Fig. 4 FHIR resources.The color map refers to the different sources of information depicted in Fig. 2. Light blue: clinical evaluations, orange: administrative info, yellow: functional test, blue: questionnaires, grey = prosthetic knee, light yellow = drug prescription.

34 Q3.
many patients who came for an initial supply return for a second hospitalization?85 85 Q2.How many patients transition from the initial supply of a mechanical knee to a second supply of an electronic knee?34 How many patients use anxiolytics/antidepressants (ATC Code: N06A and N05A) related to the risk of falls?67 67 Q4.Hospital stays from patients over 65 years old 674 674 Q5.Which type of knee (i.e., AMK, FK, LF, MPK) were worn by patients who experienced a fall?
CarePlan, AdverseEvent, MedicationStatement, DeviceUsage, Account, and Consent.We employed established and widely used dictionaries, namely SNOMED CT Continued the relationship among FHIR resources.The fundamental pillar of this dataset revolves around the hospital stay, which is modeled as an Encounter resource with a reference to a Patient resource.The Patient-Encounter couple of aggregated resources is in turn referenced by different resource types, including QuestionnaireResponse, T a b le 1.Data descriptor of the MOTU dataset.a Resource reference attribute ("->" notation).
19mparison with published datasets.An older version of the CSV dataset has already been published19, in CSV format only.It consisted of one table with 31 variables about 1486 hospital stays occurred between 2011 and 2017.The dataset presented here consists of five tables with a total of 157 variables.It covers 1962 hospital stays occurred between 2011 and 2020 In addition, we make it available in the FHIR standard format.

Table 2 .
Description of questionnaire entries.

Table 3 .
Number of instances for five different queries on CSV and FHIR datasets.