Data inspires treatments and cures for diseases like COVID-19, but solutions must accept and resolve data’s flaws, writes Product Manager, Graham King.
COVID-19 raises many questions. Why do some people become so much sicker than others; some experiencing prolonged headache and mild fever, others requiring intensive care? Why is diagnosis of COVID-19 such a moving target? What symptoms and predictors of risk have we missed?
Who will unknowingly ward off the disease and who will become gravely ill? Benefitting from experience with the human cell protein which HIV binds to, understanding individuals’ genetic differences could help scientists find targets for vaccination and therapy.
And why do Black, Asian and minority ethnic (BAME) groups have worse COVID-19 outcomes than others? A new ‘big data’ study of English COVID-19 hospital deaths suggests that increased risk of death in BAME groups is only partially attributable to comorbidities or other risk factors.
It’s still to be peer-reviewed, but highlights the great need for linked data at scale, a need that applies across healthcare. Building and testing reliable Artificial Intelligence and Machine Learning (AI/ML) applications will require large quantities of standardised data.
But to get there, we need to accede to four principles:
1. Accept heterogeneity: We shouldn’t expect clinicians to neatly code data as it’s captured. We need smart platforms which ingest differing descriptions of the same clinical concepts and can map them later into comparable forms for analysis.
Unstructured text needs similar treatment. Symptoms like loss of taste/smell – recently added to the UK NHS list of COVID-19 symptoms – are rarely coded officially but often present in clinicians’ textual comments. Text mining tools will identify otherwise unrecognisable cohorts for COVID-19 and other diseases and help uncover higher-risk groups.
2. Datasets must be combined: The new study of COVID-19 hospital deaths mentioned earlier linked an NHS England/NHSX COVID dataset with pseudonymised GP records from vendor TPP. Ethnicity was used as a data point, but was missing in over a quarter of the records. We need platforms that allow datasets to be combined from and enriched by several sources, including by individuals themselves.
3. Start digital? Stay digital! Digitally captured vital signs readings such as blood pressure, temperature and blood oxygen concentration are routinely read visually and then keyed into a computer system. This risks generating erroneous or missing records. Blood oxygen level measurements are crucial in COVID-19 – patients can often have dropping levels without symptoms until a rapid deterioration – ‘silent hypoxia’. Systems need to capture all available digital data, minimising human error, enabling monitoring and alerting and helping drive research into better understanding of the early warning signs.
4. Move to open standards: Currently, valuable data is sequestered in clinical systems which were never designed to supply it at scale. An example is the laboratory blood tests for D-dimer, which may help predict dangerous COVID-19 clotting complications. Similarly, valuable data are often unavailable or stored in unreadable proprietary formats – cardiac monitoring (ECG) is a notorious example. Today, that means needing solutions which can open system silos, decipher proprietary formats and map to open standard codes and formats. And whilst accepting heterogeneity today is important, we need to move to standardised measurements to allow comparisons – such as for blood pressure – and standard codes and formats for storing vital signs like temperature and ECG. Purchasers must demand standards and test new systems for adherence to them.
These principles are essential if data is to drive science and harness the power of AI/ML to help us respond to 21st-century healthcare challenges. Not just to COVID-19, but also to the many other health challenges we face as a global community: infectious diseases, ageing populations and reducing the death rate from non-communicable diseases such as cancer, diabetes and heart disease.