The British Heart Foundation (BHF) Data Science Centre – a partnership between Health Data Research UK (HDR UK) and the BHF – works with partners, including the NHS, to carry out research by utilising health data.
Having celebrated its first birthday in January, the centre has recently reported on the work done by its ‘CVD-COVID-UK’ consortium that is looking into nationwide linked population healthcare datasets, to understand the relationship between COVID-19 and cardiovascular diseases.
To find out more about the consortium and how researchers and clinicians can get involved in health data research, HTN spoke to the centre’s director, Professor Cathie Sudlow.
Hi Cathie, tell us about yourself and your role at the BHF Data Science Centre
I’m a neurologist, a doctor by trade, and I’m trained in epidemiology and – to some extent – in statistics.
I cut down on my clinical work around 10 years ago and started working on major UK-wide research programmes as Chief Scientist for UK Biobank. My role there was very much focused on how to follow-up half a million people, spread throughout the UK. And find out what was going on with their health.
Clearly the way to do that was to link into their NHS records. So, I learnt a huge amount and became expert in health data linkage: where to go for what data, where the data custodians were, and how they worked across most parts of the UK and the different devolved nations – what worked, what didn’t, what the bottlenecks were.
I then became involved in HDR UK. I started off leading the Scottish site, then moved over into directing the new BHF Data Science Centre, which has been set up within HDR UK. We’re a major centre within the overall institute.
I’m still based in Edinburgh but like most of my colleagues, I spend my work life these days in cyber-space.
What does the centre do?
Our mission is to improve the cardiovascular health of the public through using large-scale data analytics applied across the UK.
Essentially, we are about the application of health data science at scale – across the UK and internationally – with a cardiovascular health focus. So we are interested in diseases of the heart and circulation, including heart attacks, strokes and blood clots in the legs and lungs.
We have a small core team within HDR UK. Then we have multiple partnerships across the country with clinicians – who are cardiologists, stroke physicians, cardiac and vascular surgeons and various others – and with clinical trialists and the cardiovascular research community, particularly those interested in working with data.
Health data has been very much in the public eye throughout the course of the COVID-19 pandemic. The importance of data and deriving insights from it has become increasingly publicly prominent. It’s been great to be involved at the time that more doors than normal have been open. There has been more progress in the last 12 months than there had been for quite a few years before that – it’s really important we don’t now go backwards.
It’s motivated by COVID but, actually, some people might say ‘why have we been ignoring the pandemics of heart disease, cancer, diabetes and all the rest that have been going on for so many years beforehand?’
So, it’s been really good to have this common purpose across the public and research community. But there’s a need now, moving forward, to apply this new dynamic approach to a whole range of other disorders.
What was the data research situation like in the UK before the pandemic?
When we went into the pandemic, while some things worked, we were not in a good place compared to where we could have been in terms of a linked-up health data system for deriving insights rapidly and at scale.
In Wales, there was a set up called SAIL (Secure Anonymised Information Linkage) that had been developed over the previous decade or so for incorporating nationally collated health records – from hospitals, GPs, death certificates and a range of other settings – into a safe and secure, protected environment. You could look backwards and follow people forwards over time.
There was a similar set-up in Scotland, with data provided securely within the Scottish National Data Safe Haven. In fact, Scotland had been doing this sort of thing for longer than anywhere else in the UK.
In England, the major data custodian for national health datasets is NHS Digital. It did have a model to support research and other legitimate uses of data by people outside of the organisation. But supporting research has always been a relatively small amount of their activity and there was no trusted research environment where they could hold data and provide access securely and efficiently to approved researchers.
So that’s essentially what we’ve co-developed here with NHS Digital – their first trusted research environment to hold pretty much the entire population of England’s data, linking across different health data sources.
What do those data sources include?
GP data which covers 98% of practices, data from all the hospitals and death certificates. We’ve included COVID data sets (testing and vaccination data) in this linked healthcare data asset, to support COVID-related research as the driver exemplar, which has motivated building this capacity.
Because we’ve got a cardiovascular remit, we’ve also been working with the providers and collectors of data from the national cardiovascular audits, which provide additional information on strokes, heart attacks and cardiovascular procedures. Those are now flowing into NHS Digital and being linked with the other data assets.
It’s been months and months of work in partnership with NHS Digital to get it set up and ready to allow approved researchers to start to work with the data and derive really important insights.
There are about 50 approved data analysts now working with the CVD-COVID-UK consortium, either in the new NHS Digital environment, or in the established environments in Scotland or Wales.
So, you’ll be working with some of the devolved nations?
We set out with this idea that it would be really good to be able to draw insights and run studies that capture the whole of the UK. We also realised that, because there were already secure environments in Scotland and Wales but there was nothing in England, it would be difficult for data to flow across national borders. So, having a separate arrangement in England was going to be the solution.
Northern Ireland are now moving towards setting up something similar. And so, eventually, what we envisage is a network of these trusted research environments that can hold multiple types of data from different sources. They won’t all be identical, because the data are collected in somewhat different ways in the different nations. But it should be possible to run analyses in each environment and then bring the results together.
How would any interested researchers and clinicians get involved?
We already have a consortium of about 160 – and growing – researchers, data custodians, analysts and clinicians around the country. But we’re very open and inclusive. We want to have as many of the research community who are interested and enable them to join.
There are two ways to do that: they can contact us on our email firstname.lastname@example.org and we simply onboard them. We ask them to agree to the consortium principles around sharing information transparently, working in a coordinated way and sharing protocols in the open on our Github; or they can lodge a query via the HDR UK Innovation Gateway.
We have a streamlined way for researchers to propose projects if they want to get involved in analysing the data. Crucially, we have patients and members of the public involved in that process, to make sure we are supporting projects that are priorities for the people whose data we are analysing. We currently have approvals in place to support research projects that link COVID-19 to cardiovascular disease, but we are looking to broaden our approval so we can support more urgent COVID-19 research.
Some clinicians and researchers want to join our consortium but not directly analyse data. They may want to be involved in suggesting areas of research that would be interesting or important. They may also be able to help in designing studies.
Particularly, clinicians can help in two ways. One is interpreting the results, thinking about their relevance to clinical practice. The second is in providing assistance with using the right codes to capture the health conditions of interest from hospital and general practice coding systems. Clinicians with knowledge of these coding systems and expertise in clinical informatics have a valuable role to play.
What else is the centre working on at the moment?
We have got six thematic areas of activity. The first of those is enabling better access for cardiovascular research to structured health data. CVD-COVID-UK is currently the big driver project for that.
The next one is unstructured data. We’re interested in working with UK-wide experts on large scale access to and analyses of imaging data (e.g. from brain scans of patients with stroke, or heart scans) as well as on text mining and natural language processing techniques to surface valuable information for research studies from electronic medical records.
We have a workstream on personal monitoring data. As you know, a lot of people now collect data on themselves during their everyday lives. Apps and wearables (such as Fitbit or Apple watches) give us a lot of information about cardiovascular health and they may provide really useful objective information on physical activity and other health-related measures.
We also have a new area on data-enabled clinical trials. It’s possible to use linked health records in a really efficient and cost-effective way to find out who may be eligible to participate in a trial, to invite them to take part and to follow their health.
Another workstream, which has the rather sexy title of ‘computable clinical phenotypes’, is focused on how to represent health and disease in computable form. For example, what are all the codes that you would combine to signify that somebody has had a heart attack?
Finally, we also have a workstream on cohorts. Our focus will be on working with researchers who are setting up UK-wide disease-based cohorts in the cardiovascular arena. We hope to work with them to help realise the benefits of linking the data in the cohorts to all the rich data available from routine national health datasets.
Can you tell us more about the safeguarding of data?
It’s really important. Without good security and privacy and the trust of the public, the whole thing potentially breaks down.
The way these trusted research environments work is that the data from the NHS (for example, from GP practice and hospital computer systems) and other settings (like death registry offices) don’t need to leave the place where they are collected together. They’re brought together on behalf of the NHS by national bodies and they don’t necessarily have to be disseminated.
Not moving these data around more than necessary is a very important principle. The other principle is that researchers don’t have any need to know the identities of the individuals. All the direct identifiers – names, addresses, exact dates of birth, NHS numbers – are stripped from the data. They’re replaced by pseudo-identifiers, so the data continue to represent an individual, but their identity is not known.
What improvements are you hoping to see from projects like these?
My hopes in the UK are to maintain the momentum; that we extend the capabilities to readily support research across a whole number of areas. It can’t just be all about COVID. There are a lot of other pandemics we need to sort out.
There’ll be a lot of work over the next few years into how we apply sophisticated analyses to truly gain a picture across country borders. I would argue, for some purposes, we need a Europe-wide or global-wide scale.
Importantly though, we have already made great strides towards making data securely available for research discoveries that will improve peoples’ lives.
To find out more about the centre’s current projects, visit the dedicated section of the HDR UK website or follow @BHFDataScience via Twitter. Interested parties can email email@example.com for more information about how to work with the centre or use the HDR UK Innovation Gateway.