Why It Is Important To Understand Multimodal Large Language Models In Healthcare 

The future of medicine is undoubtedly inextricably linked to the development of artificial intelligence (AI). Although this revolution has been brewing for years, the past few months marked a major change, as algorithms finally moved out of the specialized labs and into our daily lives

The public debut of Large Language Models (LLMs), like ChatGPT which became the fastest-growing consumer application of all time, has been a roaring success. LLMs are machine learning models trained on a vast amount of text data which enables them to understand and generate human-like text based on the patterns and structures they’ve learned. They differ significantly from prior deep learning methods in scale, capabilities, and potential impact. 

Large language models will soon find their way in to everyday clinical settings, simply because the global shortage of healthcare personnel is becoming dire and AI will lend a hand with tasks that do not require skilled medical professionals. But before this can happen, before we have a sufficiently robust regulatory framework in place we are already seeing how this new technology is being used in everyday life.

To better understand what lies ahead, let’s explore another key concept that will play a significant role in the transformation of medicine: multimodality.

Doctors and nurses are supercomputers, medical AI is a calculator

A multimodal system can process and interpret multiple types of input data, such as text, images, audio, and video, simultaneously. Current medical AIs only process one type of data, for example, text or X-ray images. 

However, medicine, by nature, is multimodal as are humans. To diagnose and treat a patient, a healthcare professional listens to the patient, reads their health files, looks at medical images and interprets laboratory results. This is far beyond what any AI is capable of today. 

The difference between the two can be likened to the difference between a runner and a pentathlete. A runner excels in one discipline, whereas a pentathlete must excel in multiple disciplines to succeed.

Current Large Language Models (LLMs) are the runners, they are unimodal. Humans in medicine are champions of pentathlon teams.

At the moment most Large Language Models (LLMs) like GPT-4 are unimodal, meaning they can only analyze texts. Although GPT-4 has been described as able to analyze images as well, for now it can only do so via its API. 

From The Medical Futurist’s perspective, it’s clear that multimodal LLMs (M-LLMs) will arrive soon otherwise AI won’t be able to significantly contribute to the multimodal nature of medicine and care. When they do it will signify the start of an era in which these systems will significantly reduce the workload of – but not replace- human healthcare professionals.

The future is M-LLMs

The development of M-LLMs will have at least three significant consequences:

1. AI will handle multiple types of content, from images to audio

An M-LLM will be able to process and interpret various kinds of content, which is crucial for a comprehensive analysis in medicine. We could list hundreds of examples regarding the benefits of such a system but will mention only a few in the following five categories:

  • Text analysis: M-LLMs will be capable of handling a vast amount of administrative, clinical, educational and marketing tasks, from updating electronic medical records to solving case studies
  • Image analysis: another broad area in terms of potential use cases, which spans from reading handwritten notes to analysing radiology (ophthalmology, neurology, pathology, etc.) images
  • Sound analysis: M-LLMs will eventually become competent in disease monitoring such as checking heart and lung sounds for abnormalities to ensure early detection, but sounds can also provide valuable info in mental health and rehabilitation applications
  • Video analysis: an advanced algorithm will be able to guide a medical student in virtual reality surgery training regarding how to aim precisely, move, proceed, but videos could also be used to detect neurological conditions or to support patients communicating with sign language. 
  • Complex document analysis: this will include assistance in literature review and research, analysis of medical guidelines for clinical decision-making, and clinical coding among many other forms of use

2. It will break language barriers

These M-LLMs will easily facilitate communication between healthcare providers and patients who speak different languages, translating between various languages in real time.

Specialist: “Can you please point to where it hurts?”

M-LLM (Translating for Patient): “¿Puede señalar dónde le duele?”

Patient points to lower abdomen.

M-LLM (Translating for Specialist): “The patient is pointing to the lower abdomen.”

Specialist: “On a scale from 1 to 10, how would you rate your pain?”

M-LLM (Translating for Patient): “En una escala del 1 al 10, ¿cómo calificaría su dolor?”

Patient: “Es un 8.”

M-LLM (Translating for Specialist): “It is an 8.

3. Finally, the arrival of interoperability can connect and harmonise various hospital systems

An M-LLM could serve as a central hub that facilitates access to various unimodal AIs used in the hospital, such as radiology software, insurance handling software, Electronic Medical Records (EMR), etc. The situation today is as follows:

One company manufactures software for the radiology department which use a certain format of AI in their daily work. Another company’s algorithm works with the hospital’s electronic medical records, and yet another third-party suplier creates AI to compile insurance reports. However, doctors typically only have access to the system strictly related to their field, for example, a radiologist has access to the radiological AI, but a cardiologist does not. And of course, these algorithms don’t communicate with each other. If the cardiology department used an algorithm that analysed heart and lung signs, gastroenterologists or psychiatrists very likely wouldn’t have access to it – even though its findings may be useful for their diagnosis as well.

The significant step will be when M-LLMs – eventually – become capable of understanding the language and format of all these software applications and help people communicate with them. An average doctor will then be able to easily work with the radiological AI software, the AI software managing the EMRs, and the fourth, and eighth (etc. ) AI used in the hospital.

This potential is very important because such a breakthrough won’t come about in any other way. No single company will come up with such software because they don’t have access to the AI data developed by individual companies. The M-LLM however will be able to communicate with these systems individually and, as a central hub, will provide a tool of immense importance to doctors.

The transition from unimodal to multimodal AI is a necessary step to fully harness the potential of AI in medicine. By developing M-LLMs that can process multiple types of content, break language barriers, and facilitate access to other AI applications, we can revolutionize the way we practice medicine. The journey from being a calculator to matching the supercomputers we call doctors is challenging, but it is a revolution waiting to happen.

The post Why It Is Important To Understand Multimodal Large Language Models In Healthcare  appeared first on The Medical Futurist.

Source link