Official Title
A Database and Analytics Study of Free Text Clinical Notes and Structured Data to Investigate Phenotype Associations With Outcomes in Patients With COVID-19
Brief Summary

A retrospective cohort study investigating clinical notes using Natural Language Processing in combination with structured data from the Electronic Health Record (EHR) to create a database for analytics to identify features associated with outcomes.

Detailed Description

Patients admitted to Cambridge University Hospitals (CUH)with COVID-19 have undergone routine
clinical documentation and specific investigation and testing for COVID-19. The pathway for
these patients ranges from supportive measures on the ward to deterioration requiring
Intensive therapy Unit (ITU) admission and ventilatory support. Patients are also at risk of
developing complications such as Acute Kidney Injury and thromboembolism. Identification of
the risk factors for these and other outcomes such as the requirement for ventilation remain
a challenge and reviewing the clinical data for these patients is critical in the
understanding of the relationship between patient characteristics and outcomes.

There is data available in structured fields in the EHR, however, this is sometimes
incomplete and inaccurate. An assessment of the free text clinical notes provides an
opportunity to fill in the gaps and provide a much richer dataset for evaluation. We plan to
use Natural Language Processing (NLP) (a field of machine learning that allows computers to
analyse human language) to review Discharge Summaries of patients admitted to hospital with
COVID-19 and convert free text data into structured data for analysis.

The NLP techniques developed by Dr Collier's team include methods for coding of free texts to
SNOMED CT and other biomedical ontologies. These methods, based on statistical machine
learning from human annotated texts, have been benchmarked for scientific texts and social
media. In this project we intend to adapt these techniques for patient records. The
techniques will require a number of human annotated patient records in order to adapt. The
NLP output will be combined with structured data from the EHR and undergo statistical
analysis to identify the rates of complications in patients with COVID-19 and risk factors
associated with these. This may help to guide management decisions by earlier intervention to
prevent poor outcomes in these patients.

Completed
COVID-19
Eligibility Criteria

Inclusion Criteria:

- Male and female

- Age range: 18 to 100 years

- Patients admitted to Cambridge University Hospitals with confirmed COVID-19 on lab
testing

Exclusion Criteria:

Children and patients with a negative COVID test.

Eligibility Gender
All
Eligibility Age
Minimum: 18 Years ~ Maximum: 100 Years
Countries
United Kingdom
Locations

Cambridge University NHS Foundation Trust
Cambridge, United Kingdom

University of Cambridge
NCT Number
Keywords
Natural Language Processing
MeSH Terms
COVID-19