Paging Dr. AI

Large language models (LLMs) are artificial intelligence models that use machine learning to both process and generate human-like text after training on vast amounts of internet data to learn patterns, structures and nuances of language.¹ These models collect data and break it down into smaller components, such as sentences, phrases and words. Using probability-based algorithms, the models combine the components into coherent human-like text. Reinforcement Learning from Human Feedback is a process that allows LLMs to finetune their responses using human input. Over time, refining LLM responses using human feedback creates a model with a more accurate representation of human knowledge and speech.²

LLMs have a wide range of utility, including text generation, question answering, content summarization, content creation, tutoring and coding assistance. Its sophistication has been highlighted in articles touting its ability to pass high-stakes exams including law and medical boards examination.³ In medicine, and specifically within interventional radiology, there are opportunities to adapt LLM applications to enhance clinical care. LLMs may provide solutions to some of IR’s obstacles, ultimately improving patient care and promoting innovation.

In this article, we discuss ready and potential applications in medical education, patient–physician communication, procedure planning, resource utilization and documentation.

Current state of LLMs

OpenAI published their first Generative Pre-trained Transformer (GPT) model in 2018.⁴Since then, OpenAI has updated and improved their model over multiple versions. Their latest version, GPT-4, was published in March of 2023 and attracts over 1.8 billion users per month. There are many other LLMs, including Google’s Bard, Microsoft’s Bing AI and Meta’s Llama.^5,6,7 However, GPT-4 is the most widely popularized and has formed the basis for many of the early developments atop its base model. Specifically, the release of the GPT-4 API and plug-in developer feature has allowed integration of GPT with third-party applications to create unique assistive tools, including data-analytic tools.

Impact within medical education

The utility of rapidly accessing medical information and communicating it in an accessible way can augment medical education and streamline curriculum development. In a 2023 study, GPT-4 was used to create readily accessible clinical vignettes and clinical case simulations, while its ability to model demographic diversity was assessed.⁸

Impact within clinical care

Within clinical care, LLMs potentially possess substantial utility. They can assist in each of the following ways and beyond:

Expediting diagnosis and treatment
Pre-procedural planning (steps leading to a procedure rather than visual mapping)
Easing communication from physician to physician and physician to patient
Quickly accessing medical literature
Assisting in many steps within the innovation process

LLM capabilities in IR

Patient and subspecialty empowerment

LLM chats are now integrated into web browsers such as Microsoft Bing, available for patients to query their medical conditions and treatment options. In today’s paradigm of shared clinical decision-making, an understanding of all treatment options—such as learning about uterine artery embolization in the treatment of fibroids—allows patients to make fully informed decisions. For healthcare providers, LLMs may assist in clinical management by providing management guidelines.

Procedural planning

Pre-procedural care

LLM integration into electronic medical health records may become a reality with the recent announcement of co-innovation between OpenAI and Epic Healthcare Systems.⁹ This integration may bring a myriad of new possibilities, some of which may positively impact IR clinical workflow. Perhaps the first instance of its integration may support management of a provider’s inbox. In another, LLMs may be able to summarize patient states on a specialty-specific, dynamic dashboard by analyzing emergency notes, admission notes, progress notes, laboratory values and radiological reports to assist clinicians in information synthesis.

Moreover, there may be a future in which LLM capabilities expand to include clinical management suggestions. We demonstrate one example use case in the setting of a common IR consult for acute gastrointestinal hemorrhage, wherein each additional note and lab value updates the LLM recommendations to a clinician. Finally, LLMs could also support the advocacy of healthcare professionals and patients alike, through assistance in drafting appeals for denied preauthorization requests or contesting insurance claims deemed medically necessary.

Chart-constrained LLM recommendations

Postprocedure care

Postintervention chatbots have previously been used, though not widely clinically implemented, for surveillance of complications. Similar deployment of LLM chatbots would be feasible; however, LLMs can state and perpetuate medical inaccuracies.¹⁰ As such, LLM-based postprocedure chatbots may still be far from clinical use. However, plug-in tools are currently being developed to restrict LLM information access to a specific source(s), which may improve chatbot accuracy.

Ease of communication

A common cause of stress for patients is their inability to interpret radiological reports. Patients are notified when their radiological report is uploaded to their chart even before their physician has had the chance to follow up and elaborate on the findings. LLMs show great utility in this space, with the ability to simplify the radiologist’s medical jargon to a patient-friendly report. LLMs can effectively translate radiological reports to that of an 8^th-grade reading level or below.¹¹ Given the accessibility of medical documents in patient charts, LLM simplification of radiology reports potentially empowers patients to understand their conditions better, reduces patient anxiety and decreases unscheduled calls from patients. Similarly, LLMs can generate specialty-specific radiological report summarizations for referring providers.

Limitations

Data privacy

Although OpenAI has taken steps to promote data privacy and website security, they do not follow HIPAA-compliant practices.¹² OpenAI’s current privacy policy admits collecting uncensored personal data submitted to GPT services and using the data to train its LLMs. In its current state, GPT is not ready to be integrated into hospital systems and poses a threat to personal health information. However, OpenAI is compliant with the California Consumer Privacy Act and EU’s General Data Protection Regulations demonstrating their commitment to data security and hopefully integrating future HIPAA-compliant practices.¹³

GPT hallucinations

A current shortcoming of GPT is the production of hallucinations. Hallucinations are GPT-produced responses that seem entirely reasonable but are fabricated—a limitation of being trained on a dataset that runs up until September 2021. GPT does not actually know what is right and wrong; it simply relays information that it was trained on. Fortunately, LLM frameworks such as LangChain can constrain LLMs to datasets (such as a medical chart) wherein unknown answers produce an admission of “not known” rather than a hallucination.¹⁴

Trained data limitation

LLMs are limited to the data that they are trained on. Therefore, novel innovations or discoveries will not be reported by the model if they occurred following its training cutoff. However, plugins such as Microsoft Bing browser in ChatGPT and various addons enable LLM models to actively search the web and relay information that was produced after the end of its training period.

The origin of training data may pose a potential limitation, too. Large volumes of training information come from well-funded institutions in wealthy and predominantly English-speaking countries, presenting a risk for bias.¹⁵

Conclusions

LLMs have the potential to profoundly impact healthcare. By improving efficiency, solving complex problems and streamlining tasks, LLMs can enhance patient care and their experience with the healthcare system. Their potential applications in clinical management, physician–patient communication, procedural planning and charting are promising. It is important that we continue to develop LLMs, particularly in IR, to fully harness this technology and enhance how we practice medicine.

References

Alqahtani T, Badreldin HA, Alrashed M, Alshaya AI, Alghamdi SS, Bin Saleh K, et al. The emergent role of artificial intelligence, natural learning processing, and large language models in higher education and research. Res Social Adm Pharm. 2023 Aug;19(8):1236–1242.
Christiano P, Leike J, Brown TB, Martic M, Legg S, Amodei D. Deep reinforcement learning from human preferences. arXiv. 2023 Feb 17.
Alberts IL, Mercolli L, Pyka T, Prenosil G, Shi K, Rominger A, et al. Large language models (LLM) and ChatGPT: What will the impact on nuclear medicine be? Eur J Nucl Med Mol Imaging. 2023 Mar 9;1–4.
Radford A, Narasimhan K, Salimans T, Sutskever I. Improving Language understanding by generative pre-training. Amazon Web Services. 2018 Jun 11.
Rahaman MdS, Ahsan MMT, Anjum N, Rahman MdM, Rahman MN. The AI race is on! Google’s Bard and OpenAI’s ChatGPT head to head: An opinion article. SSRN Journal. 2023;
Stolker-Walker, C. AI chatbots are coming to search engines—Can you trust the results? DOI: 10.1038/d41586-023-00423-4.
Wu C, Zhang X, Zhang Y, Weng Y, Xie W. PMC-LLaMA: Further finetuning LLaMAon medical papers. arXiv. 2023 May 20.
Zack T, Lehman E, Suzgun M, Rodriguez JA, Celi LA, Gichoya J, et al. Coding inequity: Assessing GPT-4’s potential for perpetuating racial and gender biases in health care. medRxiv. 2023 Jul 17.
Kunze KN, Jang SJ, Fullerton MA, Vigdorchik JM, Haddad FS. What’s all the chatter about? Bone Joint J. 2023 Jun 1;105-B(6):587–9.
Geoghegan L, Scarborough A, Wormald JCR, Harrison CJ, Collins D, Gardiner M, et al. Automated conversational agents for post-intervention follow-up: a systematic review. BJS Open. 2021 Jul 6;5(4).
Li H, Moon JT, Iyer D, Balthazar P, Krupinski EA, Bercu ZL, et al. Decoding radiology reports: Potential application of OpenAI ChatGPT to enhance patient understanding of diagnostic reports. Clin Imaging. 2023 Jun 8;101:137–41.
Marks M, Haupt CE. AI chatbots, health privacy, and challenges to HIPAA compliance. JAMA. . 2023 Jul 25;330(4):309-310. doi: 10.1001/jama.2023.9458.
Open AI Security Portal [Internet]. [cited 2023 Jul 24]. Available from: https://trust.openai.com.
Lobentanzer S, Saez-Rodriguez J. A platform for the biomedical application of large language models. arXiv. 2023 Jul 21.
Li H, Moon JT, Purkayastha S, Celi LA, Trivedi H, Gichoya JW. Ethics of large language models in medicine and medical research. Lancet Digit Health. 2023 Jun;5(6):e333–5.