Global, Insight

Google Health combines large language models with AI vision encoders for X-rays

Google Health has shared an update on their work to build “lightweight, multimodal generative AI models for medical imaging” which uses a combination of large language models (LLMs) and vision encoders for X-rays.

Researchers have noted that the work forms “an initial step towards a general purpose X-ray artificial intelligence system” and suggest “LLM-aligned multimodal models can unlock the value of chest X-rays paired with radiology reports to solve a variety of previously challenging tasks”.

Capable of processing both images and text, the AI models are said to be well-suited for tasks such as disease classification, semantic search, and radiology report verification. They build on prior work to combine or ‘graft’ a vision encoder with a frozen large language model in order to perform a range of vision-language tasks relevant to medical imaging.

In order to train the models, images were paired with corresponding free-text radiology reports to allow the models to learn “the subtle nuances of medical imaging that would be difficult to capture with traditional binary labels”.

Christopher Kelly, clinical research scientist at Google, has commented that the use of images and free-text radiology reports to train the models “reduces many barriers, helps unlock the value of routinely collected medical imaging, and allows us to capture richer nuance beyond a simple binary label.”

In a research paper on the topics, the authors have shared a number of key advantages to ELIXR; firstly, they note that it “achieves state-of-the-art performance for zero-shot classification, data-efficient classification, and semantic search of thoracic conditions across a range of datasets”.

Secondly, adapting an image encoder to a large language model is “a fast and resource-efficient method of training compared to full fine-tuning of an LLM”; the researchers point out that building models on top of ELIXR can be “done rapidly to prototype new use cases, adapt to distribution shifts with a small amount of new training data, or use alternative publicly available LLMs”.

The third key advantage focuses on the potential that ELIXR’s combination of imaging and text has for a new generation of medical AI applications; the researchers note that whilst their study has demonstrated semantic search, visual question answering, and radiology report quality assurance, there are “countless potential applications across the medical domain that can be addressed using the proposed multimodal framework”.

The authors also reiterate the benefit of ELIXR using free-text radiology reports and images for training, noting that the use of routinely collected medical data means that AI systems can be developed “at a far greater scale and at lower overall cost than previously possible”.

At the start of the year, we covered how a team from Google Research and DeepMind published their evaluation of an AI application used to understand and generate language in a clinical context.

In November, we heard how Google Health and MEDITECH extended a pilot that embeds Google search and summarisation capabilities into the clinical workflow of an electronic health record.

In wider news around artificial intelligence, June saw the announcement of £21 million in funding for AI technologies across the NHS with the aim of speeding up diagnosis for conditions such as cancers, strokes and heart conditions.