Official Title
Validation of Machine Learning (ML) Models as Diagnostic Tools to Predict Infection With SARS-CoV-2
Brief Summary

The Covid-19 viral pandemic has caused significant global losses and disruption to all aspects of society. One of the major difficulties in controlling the spread of this coronavirus has been the delayed and mild (or lack of) presentation of symptoms in infected individuals, and the insufficient Covid-19 testing capacity in the UK. This warrants the development of alternative diagnostic tools that reliably assess Covid-19 infection in the early stages of infection, while also being low- cost, low-burden, and easily administered to a wide proportion of the population. This study aims to validate machine learning models as a diagnostic tool that predicts infection with SARS-CoV-2 based on app-reported symptoms and phenotypic data, against the 'gold-standard' swab PCR-test. This study will take place within the Covid Symptom Study app, the free symptom tracking mobile application launched in March 2020.

Detailed Description

The Covid-19 viral pandemic has caused significant global losses and disruption to all
aspects of society (including health, education, and business and economic security). One of
the major difficulties in controlling the spread of this coronavirus has been the delayed and
mild (or lack of) presentation of symptoms in infected individuals. Moreover, there is
insufficient Covid-19 testing capacity in the UK, and only moderate accuracy of such tests at
confirming coronavirus infection. Together, these obstacles have led to countless unknown
coronavirus cases going unobserved and fuelling the viral spread in the population, by
compromising the stringency of self- isolation measures undertaken by infected individuals
who may have otherwise curbed or prevented their transmission of the virus. The profound and
widespread cost of the continuing Covid-19 progression, coinciding with the lack of testing
capacity, warrants the development of alternative diagnostic tools that reliably assess
Covid-19 infection in the early stages of infection, while also being low- cost, low-burden,
and easily administered to a wide proportion of the population.

The free symptom-monitoring app 'Covid Symptom Study' was launched in mid-March by health
technology start-up Zoe Global Ltd, and is currently being used in the UK, US and Sweden,
with more than 2.7 million users in the UK alone who use the app to self-report their
Covid-19 symptoms. Upon registering to use the app, users are asked to report demographic and
phenotypic data such as age, sex, BMI, ethnicity, contact with infected individuals (through
a healthcare professional capacity), smoking behaviour, existing health conditions, among
other information. From then on, users are asked to report, on a daily basis, their
presentation of symptoms attributable to Covid-19 (or lack thereof) through the use of
app-administered questionnaires, thus enabling real-time tracking of disease progression
across the UK. The app also allows users to report their Covid-19 test results, thus enabling
the development of prediction algorithms based solely on self-reported user data to predict
the presence of infection in untested users.

On behalf of Zoe Global Ltd, the UK Department of Health and Social Care with support from
the UK's Chief Scientific Advisor has committed to test up to 10,000 app-users per week for
infection with SARS-CoV-2 across England and Northern Ireland, for the purpose of rapidly
improving the accuracy of symptom-based predictions. Similar testing allowance may follow in
Scotland and Wales.

Symptomatic app-users will be asked to get tested for SARS-CoV-2 infection, using the popular
swab and qRT-PCR technique, and asked to report their test results in the app, while
continuing to log their symptoms.

This validation study, conducted at King's College London, aims to validate the sensitivity
and specificity of machine learning models as a diagnostic tool that predicts infection with
SARS-CoV-2 based on app-reported symptoms and phenotypic data, against the 'gold-standard'
swab PCR-test, by utilising the Covid Symptom Study app as a research platform.

It is hypothesised that by training the symptom-based models using swab test results and
through multiple model iterations following continuous data input from reporting and tested
app users, predictions of infection will be made with considerable accuracy, thus enabling
the Covid Symptom Study app to be used as a diagnostic tool that alleviates the strain of
testing capacity in the UK while being easily accessible and posing low user burden.

Study Design:

Due to the rapidly developing and uncertain duration and intensity of the Covid-19 pandemic,
the present study design is prospective and one that enables regular iteration on prediction
models and continuous accumulation of validation data. The study consists of a series of
phases, each lasting 14 days. Before the start of each phase (day 0), a set of machine
learning models will be frozen and submitted for validation on data collected during this and
subsequent phases.

Machine learning algorithms improve with increasing data. Therefore, validation phases will
continue as long as tests are available and app users consent to joining the study. Due to
the uncertainty around the progression of UK infection rates, the validation study will be
continue whilst it is of value to public health.

A detailed statistical analysis plan is described in the document attached to this record. A
record of all machine learning models used for validation will be regularly updated on GitHub
(https://github.com/zoe/covid-validation-study).

Recruiting
COVID-19

Diagnostic Test: Covid-19 swab PCR test

Participants satisfying machine learning test criteria will be asked to take a swab test for Covid-19.

Eligibility Criteria

Study Inclusion Criteria - app users will be eligible to join the study if they:

- Are based in the UK (are using the UK version of the Covid-19 Symptom Study app, and
have listed a UK postcode)

- Are the primary app user (are reporting directly for themselves)

- Are at least 18 years of age

- Have not tested positive for a Covid-19 test before (but may have been tested)

Study Exclusion Criteria - participants are ineligible for the study if they:

- Do not meet inclusion criteria

- Do not provide informed consent to participate

Participants will be subject to further screening to identify them as eligible for swab
testing during the course of the study.

Swab inclusion criteria - participants will be eligible for swab testing if they:

- Have reported in the app at least once in the previous 3 days (days -2 to 0), and at
least two times in the previous 9 days (days -8 to 0). All reports must be healthy
(i.e. not experiencing any symptoms).

- On the previous day (day 1), have reported that they are experiencing at least one
symptom described in the app. Symptoms in the app are updated when deemed appropriate
by study investigators using evidence based reports in the scientific and medical
field.

- Have answered the phenotype fields required for the prediction model with
physiologically plausible values.

Swab exclusion criteria - participants are ineligible for swab testing if they:

- Are asymptomatic

- Do not satisfy the inclusion criteria for testing.

Insufficient testing capacity:

If insufficient testing capacity is available for the study population as described, then
recruitment will be prioritised according to:

- Firstly, most recent final healthy report before reporting symptoms

- Secondly, highest number of healthy reports during the previous 9 days before
reporting symptoms

- Thirdly, randomised selection to stratify between participants of equal priority
according to the first two rules above.

Excess testing capacity:

If excess testing capacity is available beyond the study population as described, then
inclusion criteria will be expanded in order to adequately sample across under-represented
population groups.

Specifically, on day 7 of each validation phase, investigators will assess:

- What excess testing capacity is available, if any

- Which subgroups are under-represented compared to their proportion in the UK
population (as best as can be established given that some participants may not have
completed some phenotype fields):

(i) Age decade (ii) Sex (iii) Ethnicity (iv) BMI category

For underrepresented groups, investigators may additionally recruit participants with only
one report during the previous 3 days (days -2 to 0) and no other report during the
previous 9 days (days -8 to 0).

Eligibility Gender
All
Eligibility Age
Minimum: 18 Years ~ Maximum: N/A
Countries
United Kingdom
Locations

King's College London
London, United Kingdom

Investigator: Inbar Linenberg, MSc
Contact: +447791871699
inbar.linenberg@kcl.ac.uk

Contacts

Inbar Linenberg, MSc
+447791871699
inbar.linenberg@kcl.ac.uk

King's College London
NCT Number
Keywords
Covid-19
Machine learning
Covid-19 diagnostic
MeSH Terms
COVID-19