JCPM Research Project FHIR® Data Model |
By: Alex Baumberg
In this article, we will outline our first research project in a series of Cross-institutional Research Projects mediated by HL7® FHIR® standard – "High-risk pregnancy (CMV CGM)," a collaboration between Outburn and The Jerusalem Center for Personalized Computational Medicine (JCPM), centered around mapping the variable pool of pregnancy at high risk and Delivery to FHIR resources at three healthcare organizations: Hadassah medical center, Shaare Zedek Medical Center (SZMC) and Meuhedet HMO.
Introduction
Precision medical research relies on the ability to map the span of disease and treatments across the population. Today, digital information is available to render such investigations possible. However, when it comes to studies on medical records in the context of big data, occasionally, we require backfilling of data from several sources across a single medical file. In this project, we aim to map variables for research purposes, where the focal point is achieving standardization across medical institutions. This
application centers around mapping the variable pool of high-risk pregnancy and Delivery, with a background of Cytomegalovirus (CMV) in the mother, to FHIR® resources at three healthcare organizations: Hadassah, SZMC, and Meuhedet HMO. The study is based on monitoring pregnancy at high risk and the medical observations of the consequent Delivery and the newborn's health. The medical file traces the pregnancy timeline for encounters and their related medical observation. Then, the labor procedure will be further covered by combining data from inpatient care and spanning information on the mother and the related person(s) involved, namely the newborns.
Please note that any capitalized name in this article indicates that
it is a FHIR® profile, as opposed to a person or entity. For example – Patient-Mother
is a profile describing a mother. In addition, profile names in this article conform to the names that
appear in the diagram in the data model section but might not represent the
real-world resource name. This is for readability and comprehension purposes.
Main project objectives:
· Creation of a unified FHIR-based data model that describes and maps parameters for "high-risk pregnancy (CMV CGM)" in each of the organizations
· Design and implementation of a methodology that enables production data preparation, mapping and transformation into the corresponding FHIR resources for further injection into the FHIR-server in each organization, using a unified FHIR data model true for all participating organizations in the current project.
· Data retrieval and deidentification process on-demand, according to the previously approved deidentification profile and the target population as per research requirements. The detailed description can be found in the published article: "Managing inter-organizational research projects with FHIR-based architecture."
· De-identified data preparation for further usage by distributed Federated Machine Learning Model.
· Creating a technology platform that allows for the reuse of project products for future research projects.
Methodology:
The main starting point in the project was to define an appropriate FHIR implementation methodology suitable for a research project, using unique steps that differentiate it from the methodology
used for defining and implementing clinical FHIR based projects.
One of the main differences between clinical and research projects is that the data model building approach is based on the research requirements. The primary source of the data, in this case, is a BI
platform, preparing the target data according to the research requirements by retrieving the production data from clinical and administrative systems. Another differentiation is a unique FHIR profiles definition based on standard resources defined by the HL7 FHIR standard. In the case of research projects,
FHIR profiles were specified to keep atomic values along with Aggregate /summary information initially prepared within the BI platform. We will deep dive into one of the implemented profiles and present the profile definition within the "The FHIR collaboration platform" - https://simplifier.net/
Now, we will discuss the main aspects of the deployment methodology:
1. Initially, we made a deep dive into the research requirements with the close collaboration of the clinical researcher. The target was to get a clear understanding of each business value and its type (e.g., code /number /text)
2. The second step was to explore external resources for the previously specified FHIR Implementation Guide suitable for our project. A real example from the current project was exploring UK NHS Maternity and Netherlands BirthCare implementation guides. The target was to correlate clinical profiles with the
research requirements and partially reuse certain parts of the implementation guide for our needs instead of reinviting the wheel and proceeding with the FHIR data model specification from scratch. The mentioned approach brings results: part of our project's "Birth Delivery" data profiles based
on both implementation guides. This is the power of FHIR as an open-source standard.
You can review implementation guides explored in the current project here:
https://developer.nhs.uk/apis/digital-maternity-1-0-0/
https://informatiestandaarden.nictiz.nl/wiki/Gebz:V1.0_FHIR_IG
3. The third step was to group research requirements into logical and meaningful groups and define the most appropriate FHIR resources. For Example, all values keeping the pregnancy summary information were grouped into logical groups and mapped to the Observation resource specified by the newly created profile – Pregnancy Summary.
4. The next step of the implementation methodology was to define an appropriate element or object within the specific FHIR resource according to the logical definition of source data value. The preferable approach, in this case, was to figure out standard elements defined by FHIR standard, which is strictly related to the source data types defined in the first steps:
i. Atomic values (e.g., Patient ID),
ii. Aggregated /Summary values (e.g., number of past pregnancies)
iii. CodableConcept-based values (e.g., LOINC code defines the Outcome of Pregnancy).
5. Finally, we defined references between previously created resources to create a hierarchy data model, describing logical connections between different groups of data characterizing research requirements. (e.g., “Observation-Pregnancy Summary” is linked to the Patient-Mother.
One more Example: Patient-Child are linked to Patient-Mother using RelatedPerson resource, defining the Patient Mother as biological Mother of the Children.
The diagram below represents the relationship between Patient Mother and Patient Child (two children)
6. In this step, Outburn and the organizational BI team wrote a Source-To-Target specification document that defines mappings between source values extracted from the clinical system to FHIR resources and elements. In this step, we agreed on the Logical ID's (FHIR resource primary key) and FHIR server injection approach. (e.g., using BUNDLES aggregating several resources with the previously specified references between them into the transaction message over the RESTful services)
7. The next step was to create FHIR objects and inject them into the organizational research FHIR server. This step, in general, is performed by the organizational ESB team with Outburn's close support.
8. The final step was to get an approved de-identification profile on the element level required for de- identification process execution.
Other steps like data FHIR server configuration, conformance resource populations, and data validation are out of scope for the current article.
Data Model
In this chapter, we will provide a comprehensive analysis of the FHIR® data model consisting of previously specified profiles based on standard FHIR resources and logical references between them.
The diagram below represents the complete FHIR data model of the target research: High-risk pregnancy (CMV CGM):
Before diving into the details, it is essential to understand the term “Cardinality” and how it is used in the diagram:
All attributes defined in FHIR have cardinality as part of their definition - represents the minimum number of required appearances and a maximum number.
The resource cardinalities are as the following:
- (0..n) - the resource is optional and depends on certain events and data in clinical systems. In addition, multiple occurrences are allowed. For Example, the Procedure – NewBorn Complications profile describes Newborn complication treatments immediately after birth.Such treatment is not always required. Conversely, in certain cases, more than one treatment is required and then, the cardinality in this case, is defined as 0..n
- (1..n) -The resource is mandatory (required). Multiple representations are allowed. For
Example: Patient- Child profile describes the demographic information of a child or children. - (1..1) - The resource is mandatory and single.
As seen in the diagram, each resource has its own cardinality, represented in bold parentheses and each logical Reference’s cardinality is represented in red text on the reference line. For Example Single instance of Procedure - Delivery resource is linked to the Encounter Delivery resource by the "encounter" field reference (1..1) and to the Patient resource using the "subject" field reference (1..1).
OK, now we can deep dive into the data model. First, we will group all profiles according to their logical meanings and define the business logic of the references between them.
Since the target research is focused around clinical and administrative data related to mother and child, all entities are, in fact, linked to the Patient-Mother and Patient-Child resources. In addition, due to strict requirements to link clinical events to a patient’s Encounter (inpatient or outpatient admission information), most of the clinical events will be linked to these resources. The Encounter resource provides comprehensive information about interactions between patient and healthcare provider, including exact time slots and locations. Hence, the reference between resources providing clinical-related information and encounters will provide the researcher exact time slots and locations of clinical events occurrences. The data model specifies three types of Encounters: Encounter Delivery, Encounter Pregnancy, and Encounter NewBorn.
- Encounter Delivery describes Birth events starting from Obstetric ER to Patient Mother discharge from the hospital.
- Encounter Pregnancy describes Patient Mother’s visits to the hospital during the pregnancy
- Encounter NewBorn describes events starting from birth to discharge from the hospital.
An important point to mention: Resources keeping aggregate information (e.g., Observation Pregnancy Summary) are not linked to the Encounters since there are describing summaries only, and not clinical events that occurred during patient visits.
Now, we can group resources by their clinical and logical meanings:
- Patient Mother related resources:
o Observation – Pregnancy Summary: Aggregate information about a certain pregnancy
o Observation – Pregnancy History: Aggregate information about all pregnancies and birth
o Condition – PastMotherDiagnosis: Patient Mother diseases
o PatientChild
- Pregnancy-related resource
o Observation – Blood Pressure: Blood pressure measurements taken during the Mother’s visit to the hospital
- Delivery: Group of resources providing overall and comprehensive information about the delivery with a strong focus on the research requirements. As mentioned before, we referred to external
implementation guides (NHS and Nederland BirthCare) for an appropriate definition of the Delivery workflow. Eventually, we built a target data model by partially deploying certain profiles and elements taken from external sources. The delivery-related data model is based on the following profiles:
o Procedure - Delivery: defines a delivery method, birth phases, complications, and medications used during the delivery
o Procedure - Complication Treatment: complicationreasons and treatments.
o Procedure - Placenta Delivery: Placenta delivery related procedure
o Observation - Placenta: Placenta observation
and status
o Observation - Blood Pressure: Blood Pressure measurements taken during the Delivery.
As we can see, Delivery related resources are linked to the Delivery Encounter
- Patient-NewBorn: Group of resources providing Newborn related observation and procedures.
o Observation – UmbilicalCord: Umbilical Cord observation performed immediately after the birth
o Observation – HeadCircumference: Head Circumference measurement
o Observation – NewbornBodyWeight: New Born Body Weight
o Observation – NewBornFindings: Current profile provides information about findings specified by the research requirements
o Condition – NewBorn: Newborn condition information - diseases etc.
o Procedure – NewBornComplication - Newborn treatment procedures
Profile definition example – Observation Pregnancy Summary
In this chapter, we will share an example of the profile definition - Pregnancy Summary based on the Observation resource. The Observation Pregnancy Summary is defined as an aggregating profile summarizing pregnancy-related information according to the research requirements. The Observation resource is comprised of independent components, where each one provides a different piece of information.
Following, a couple of component examples:
- The component below provides "Outcome of pregnancy" information related to the particular pregnancy:
{
"code": {
"coding": [
{
"system": "http://loinc.org",
"code": "63893-2",
"display": "Outcome of pregnancy"
},
{
"system": "http://snomed.info/sct",
"code": "267013003",
"display": "Past pregnancy outcome"
}
],
"text": "תוצאות הריון"
},
"valueCodeableConcept": {
"coding": [
{
"system": "http://loinc.org",
"code": "LA14270-5",
"display": "Live Birth"
}
],
"text": "לידה"
}
}
The component is comprised of two sections:
- The action taken to make the finding and/or the property about which the property was observed. In our example, the value is defined by two codes:
- LOINC code: 63893-2
- SNOMED code: 267013003
Both codes represent the same clinical term: Outcome of Pregnancy
- The result of the observation. In our example, the observation result or value as it is defined by the standard is specified by the data type "CodableConcept" and refers to one or more commonly used
terminologies.
In our example, the value is defined by LOINC code: LA14270-5 - Live Birth. The current code is one of the possible options (answers) representing the outcome of pregnancy. You may find an explicit
definition here:
As you can see, four possible values represent different Pregnancy Outcomes.
Each component has an additional
definition called Component Cardinality, specifying the possible “strength” of
the component.
In our Example, The outcome of pregnancy may have 1..* components representing the outcome of pregnancy of each NewBorn.
- The second Example provides information about the number of fetuses in the current pregnancy:
{
"code": {
"coding": [
{
"system": "http://snomed.info/sct",
"code": "246435002",
"display": "Number of fetuses"
}
],
"text":
"מס׳ עוברים"
},
"valueQuantity": {
"value": 2,
"unit": "#",
"system": "http://unitsofmeasure.org",
"code": "{#}"
}
}
As you can see, the
component has the same structure as in the previous Example, except for the
data type of the answer, which is of Quantity data type, indicating that there
are 2 fetuses in our pregnancy.
You may find the formal definition
of Observation-Summery Type profile by visiting our Simplifier page in the link
below:
https://simplifier.net/outburntest/pregnancysummary
Challenges:
Here are several challenges encountered during the data model
specification:
- Utilizing the appropriate approach to the research-based data model that contains aggregate
and summary information: The main challenge was to choose appropriate resources
to aggregate, along with summary information and to define profiles
accordingly. One of these profiles, described in the topic above, is the
Observation - Pregnancy Summary.
In addition, transformation of external
Birth/Delivery clinical profiles (NHS and Birth Care) to the research profiles
required a deep clinical understanding of the entire delivery workflow and aimed
to apply appropriate transformations to a research-based data model.
- The unique structure of research-based profiles introduced another challenge – we discovered
that the absence of LOINC codes required the description of certain observable values
within the aggregated profiles (e.g., Observation - Pregnancy Summary). In our
case, it was a code (not a value – please refer to the topic describing the
Observation resource structure) describing the method of induced abortion. The
request for a new code creation was raised to the LOINC.
- We discovered situations when standard FHIR resource definitions couldn't be used to
populate a specific research requirement. In this case, the appropriate and straightforward
solution is to define an extension, as per FHIR standard definition. However,
the extension should be formally specified and published. One example is a
value describing the "Statistical area" information linked to the Patient's
address. The request was raised to the CORE IL team, responsible for formal
extension definition.
Summary:
In this article, we have described the FHIR based data research specification process, including implementation methodology, the entire data model, and main approach differences, as well as challenges
between clinical and research projects.