News

NHSE highlights challenges with AI evaluations: potential for bias, adoption site capacity, variation between sites

NHS England has shared guidance on evaluating artificial intelligence projects and technologies with learnings from the ‘Artificial Intelligence (AI) in Health and Care Award’, which ran for four years until 2024 and supported the design, development and deployment of “promising” AI technologies.

NHSE states how evaluation focused on eight key domains in relation to each piece of AI tech: safety, accuracy, effectiveness, value, whether the tech addressed requirements and population needs at the site at which it was deployed, the reasons for implementation and barriers faced, feasibility for scaling up, and sustainability.

From these evaluations, a number of lessons are shared; firstly, NHSE writes that national oversight of designs “will help ensure stakeholder expectations are met”, evaluations  for the AI technologies should include representatives from NICE, the UK National Screening Committee, specialist AI academics, and experts in patient and public engagement.

Secondly, NHSE emphasises the need to deploy and evaluate AI in co-production with technology suppliers, independent evaluators and adoption sites, including clinical and patient users. This “helped ensure accurate assessment of the accuracy of algorithms” as well as identifying opportunities for improvement. In terms of areas to improve, NHSE notes that AI technologies had already been selected and deployment plans made before the evaluations were commissioned. Additionally, lack of capacity or motivation from adoption sites to participate in the AI deployments was sometimes experienced as the projects were led largely by the technology suppliers, meaning that some projects did not sufficiently consider technology integration into clinical pathways as well as down-stream impact.

Another lesson shared by NHSE is that “at least two years should be allowed for evaluations of multi-site technology deployments”. For these evaluations, several months were required for three different stages of designing and planning the evaluation, understanding the baseline situation/outcomes and understanding variation between sites, and bedding the technology in before beginning to measure impact.

“Future national programmes should encourage quasi-experimental, mixed-method evaluation designs,” NHSE writes, explaining that 11 of the 13 AI evaluations took this approach and were “more suited to AI implementation, allowing for rapid updates to the technology platforms”, as opposed to randomised controlled trials.

A further consideration for national programmes could be a stronger focus on assessing the impact of AI technologies on health inequalities; here, NHSE acknowledges that the potential for AI algorithms to produce biased outputs and exacerbate health inequalities. Although some evaluators planned the inclusion of sensitivity analysis, this was not achieved consistently; therefore a “more explicit focus” is encouraged.

NHSE points out that rapid changes in the sector means that teams rely “heavily” on guidance and resources from sources such as MHRA, NICE, Health Research Agency and NHSE itself; therefore, these organisations should ensure that this information is regularly updated.

The guidance goes on to provide more detailed review of the lessons learned at each stage of the evaluation process. For example, during the scoping stage whereby teams sought to establish regulatory status of AI tech and its intended purpose, NHSE raises two specific issues which were observed for reflection: different opinions over whether the AI algorithms intended to improve operational efficiency of clinical admin were classed as medical devices, and different opinions between stakeholders as to the level of risks that AI algorithms in this category posed to patient care.

Other scoping-related advice includes engaging with adoption sites early and providing sufficient time to understand variation between different sites’ clinical practices and IT systems, as well as their capacity to participate in evaluation; and contacting the Health Research Authority for support if it is unclear whether an evaluation is categorised as research or a service evaluation.

With regards to designing and planning evaluation, NHSE shares that mixed qualitative and quantitive methods were “crucial” to understanding impact, and helped evaluators understand areas such as differences in clinical pathways and care practices before evaluation; differences in local IT systems and their usage; variations in calibration of AI technologies; differences in access to technology and training prior to evaluation; and variations in the use of Ai outputs by clinicians or admin teams.

NHSE highlights the importance of pragmatism and flexibility in evaluation designs; the value of access to independent AI expertise and clinical leadership; and the “critical” nature of ensuring dialogue and written agreements on data sharing arrangements. Additionally, NHSE notes that the evaluation audience should be consulted when establishing the approach to health economic modelling; the support provided by patient ‘super users’ or patient organisations when evaluating clinician-facing AI tech; that “several months” should be allowed for baseline analysis and bedding in of technologies; and that evaluations should include “thorough” analysis of outcomes in light of patient characteristics and clinician experience.

When it comes to conducting evaluations, NHSE recommends regular checks on the use of technologies and the quality of data being collected; and talking to patients to “build up a picture of the acceptability” of AI incorporation into a care pathway.

The document can be found in full here along with supportive case studies.

AI in healthcare: the wider trend

From University Hospitals Coventry and Warwickshire, we reported on how AI is being utilised to help reduce the number of missed appointments.

We explored other uses of AI across NHS organisations here, including insight from Bolton, East Suffolk and North Essex.

What are the biggest concerns for AI in healthcare, and what are the barriers to responsible AI? We asked our audience – check out the answer here.

Also from NHSE

We examined the most recent board papers from NHSE at the start of October, with discussion including key statistics for areas of focus within the NHS; plans for data improvements in specific areas; and comments around how the NHS can support the government’s planned shift towards preventative, digital and community-based care.

Also in October, we examined improvement guides and analytic tools published by NHSE, designed to help support flow through the emergency care pathway; to generate greater value for patients from theatres, elective surgery, perioperative care and outpatient services; and to improve medical consultant job planning.

September saw news around primary care implementation of the NHS patient safety strategy; winter priorities to launch NHSE’s clinical and operational improvement programme; and guidance on single point of access functions, including the role of digital and data.