Secondary Care, Voice

From Academia to the NHS: Thoughts from a Data Scientist in this time of crisis

By Benjamin Taylor, Lead Data Scientist, Blackpool Teaching Hospital NHS FT for HTN.

Ben, the first ever data scientist at the trust started on 23rd March 2020. Here Ben covers some of his experiences so far, the predictive modelling he has done and his challenges and ambitions. 

Having spent all but one of the last 20 years in academia, following the path from undergraduate to lecturer, my decision to move into the NHS as a data scientist did not come lightly. I did not wish to take just any position in data science (there are lots, you see), I wanted to make use of my training to help others in some way and this move is exactly that to me. And what a time to join!

Because my eldest son is immune compromised we’ve been advised to isolate, so I spent my first day at the Trust working from home. I was able to put my expertise in spatiotemporal statistics to use straight away: developing COVID-19 forecasting algorithms that make use of national scale data to help inform healthcare planning locally.

On the second day, I ventured into work briefly to pick up my Trust laptop and hold a strange set of socially-distanced meetings outside my office in the car park, first as a pair, then in a triangle and later in a square. After these meetings back home again and engrossed in my coding … something I’ve always enjoyed doing since I was a child in the mid ’80s.

Some of you may not be familiar with the concept of data science. It is a somewhat modern term that encompasses a mix of statistics and machine learning, but at the end of the day, what I’m about is trying to make the best use of the data that the Trust hold. And anyone that has been into the Trust archives will know that we hold absolutely masses of data!

We are collecting data at increasingly fine scales too, by which I mean that when a patient comes into the hospital, their movements through the system can be logged, with treatments and characteristics (like vital observations at triage, clinical frailty score, optimal placement, reasons for delay and expected discharge date) being recorded in real time. To me, the complexity is more than a bit fascinating, and I’m going to make it my mission to get the most out of these data.

Data science sometimes makes use of very sophisticated methods to extract patterns in data. Once we have identified these patterns, we can use them to help plan care at an individual, ward, or hospital level. For instance, we could predict in advance the likelihood of an individual going into intensive care, or the likelihood they will re-admitted with some complication following surgery. Computers are much more able to spot patterns in data than humans are, even expert humans (this has been demonstrated recently by the advent of algorithms such as AlphaZero, and AlphaGo). But it should be obvious too: they are able to read huge amounts of data extremely quickly, they have perfect memories and can perform extremely difficult calculations that no human can do at lightning speed without making a single mistake.

This is not, of course, to undervalue the input of humans into this process! We totally rely on experts in the system to provide us with the information we need to make decisions, right from the time that data is entered onto the system. We have a saying in data science: “garbage in, garbage out”, or words to that effect! When developing statistical or machine learning models, we should always listen to the experts – and for me, that is everyone else in the hospital. I am really looking forward to understanding more about the way our hospital runs and making use of all the expert knowledge we have to help inform decision-making processes. There is even a formal inferential paradigm that allows for the incorporation of expert knowledge into statistical models: something known as Bayesian methods.

In the meantime, I’m contributing to the development of a daily forecasting system to help plan our resources for COVID-19: this week, I’m helping to understand and predict demands on beds and intensive care facilities. The modelling framework I’ve been using up until now (see https://arxiv.org/abs/1704.05627 ) is rather technical and there are challenges in making the results of such complex analyses widely accessible. For me, two-way communication is the key here. As a data scientist the way I think about, talk about and visualise data might be very different from those in front-line decision-making roles. I am in the business of uncertainty quantification: when I present the results of an analysis, I must include some measures of how unsure I am about the predictions. This can be in opposition to the decision makers, who sometimes just need to know a figure – so it is a case of arguing the need for the uncertainty and figuring out a way to communicate that so it is useful. I’m quite sure this is just the beginning of a dialogue, and while I need to understand more about what information senior decision-makers need, I hope they will also come to understand more about the benefits of including uncertainty in their decision-making processes. I hope we will work together to ask increasingly complex questions of the data we hold to the benefit of our patients.

One of the big challenges we have faced so far is Information Governance – and I confess here that there is a risk here of me entering into a rant.

<rant> I am quite surprised by the fact that I do not have access to data from other Trusts in the UK in my new NHS role. The issue here is numbers – I can provide better predictions if I have access to more data. While we have seen a good number of COVID-19 cases in our Trust, there have been many, many more across the country. In order to plan care, we need to know things like: what is the chance this person will need ventilation or ITU support, how long are they likely to be in hospital, will they need other types of specialised care? These questions require data at the individual level – how old are the people presenting, what other underlying conditions do they have, when did they first experience symptoms etc. While I can develop forecasting models for our local data that exploits this information, it would be so much better if we could compare our experiences to those across the country. This could easily be done in a manner that preserved individual privacy – and surely it would be to the benefit of all. Perhaps these views might be regarded as controversial by some, but I would argue that it is in the interest of the public, and in particular future patients of our hospitals to make centralised datasets available for analysis by local Trusts. In that way, we can learn from the experience of others how best to manage our care locally. This is so important because the NHS are now recognising the need for data scientists: units such as the one I now belong to are becoming more commonplace – we have the expertise in-house to handle complex analyses, so please let us get on with it! Again, maybe this is a communication issue, but someone needs to take the lead in this discussion at a national level </rant>.

In these sombre and strange times, it is nice to know that the substantial amount of time I spent in lectures and solving problems on integral calculus, differential equations, measure theory and probability did not entirely go to waste and there is good use for this knowledge in the real world!