How the use of Retrieval Augmented Generation (RAG) will Benefit Federal Healthcare
Introduction
Federal healthcare must evolve to meet evolving health needs. Digital transformation has become a crucial factor in the modernization of federal healthcare, including the modernization of information technology systems, enhancing patient experiences, and providing comprehensive public health solutions. The federal government recognizes the need to leverage data and technology to improve healthcare delivery. One of the frontier technologies in public healthcare delivery is artificial intelligence (AI). The traditional LLMs, like GPT-3, have remarkable capabilities in generating human-like text. However, they have limitations because they are trained on vast text data, they lack recency, and cannot learn meaning in data. These limitations lead to response inaccuracies, low-quality outputs, and the inability to handle queries that require content-awareness. To mitigate these weaknesses, Retrievable Augmented Generation (RAG) is considered a promising solution due to its ability to provide accurate contextual information, which improves the accuracy of LLMs while minimizing their hallucinations. Microsoft offers a rich environment for integrating RAG to handle the complexities of federal healthcare.
Overview of Retrieval-Augmented Generation Technology
A preliminary literature search shows there is no universally accepted definition of the term “Retrieval-augmented generation (RAG)”. However, there is a consensus that RAG broadly refers to a technique for developing generative Artificial intelligence (AI) applications. RAG is commonly used as a methodology for AI development in scenarios where Large Language Models (LLMs) are connected to external sources of knowledge as a strategy to improve accuracy, quality, and performance. LLMs refer to language models that use pre-trained deep learning (DL) algorithms to generate human language text or to perform other general-purpose language generation tasks, collectively known as natural language processing (NLP). While LLMs have unique capabilities in NLP tasks, they face limitations due to outdated knowledge, lack of transparency, un-traceability of reasoning processes, and generation of incorrect information (Gao et al., 2024). These limitations impede the widespread application of LLMs in real-world environments without appropriate safeguards. Consequently, RAG has become a promising solution for augmenting knowledge from external sources to improve the accuracy and credibility of language models. RAG works by integrating external data retrieval into the generative process. Therefore, RAG is currently considered a methodology or paradigm within LLMs with the primary purpose of improving generative tasks. RAG combines two basic components: LLMs and retrieval systems, such as search systems. Figure 1 shows the RAG process when replying to use queries. The user sends a query as input. The retrieval system performs indexing and retrieval tasks based on relevant documents. Then the LLM performs generation tasks by combining context and prompts to formulate a response.
Figure 1. An illustration of the RAG process in the case of responding to a query (Gao et al., 2024).
However, the naïve RAG presented in Figure 1 presents challenges, such as hallucinations, redundancy and repetition, and overreliance on augmented information. advanced RAG and modular RAG have been developed with enhanced capabilities to address indexing challenges and provide greater versatility and flexibility, respectively (Gao et al., 2024). Modular RAG is a specialized framework with additional modules, such as search, memory, and prediction modules. These modules are adapted to specific problem contexts.
Uses of RAG in Federal Healthcare
The Centers for Medicare and Medicaid (CMS) plans to accelerate the modernization of healthcare delivery by integrating advanced health IT (HIT). Medicare is a federal health insurance program designed to address the needs of older people aged over 65 years. On the other hand, Medicaid is a program run jointly by federal and state governments to help citizens cover medical costs. To support the delivery of healthcare, the federal government has initiatives for enhancing patient safety, providing healthcare, improving access to quality care for vulnerable people, regulating healthcare markets, and promoting the acquisition of new knowledge. The CMS modernization initiatives involve collaboration with Federal agencies and policymakers, including the U.S. Congress. RAG is one of the innovative technologies that can offer CMS enormous benefits and opportunities for improving healthcare delivery. According to Zhang & Boulos (2023), generative AI, like Open AI’s ChatGPT, has profound implications for healthcare and medicine.
Uses of RAG in Clinical Administration and Decision Support
One of the areas where federal healthcare could benefit from RAG is the optimization of clinical administration and decision-making (Wang et al., 2023). Currently, AI chatboxes have been developed with general cognitive capabilities. These AI tools can be used to engage patients and clinicians in discussing health conditions. However, the existing chat-box engines provide generic responses to queries, which limits their application in areas that require dynamic clinical advice to patients or personalized instructions. RAG has emerged as a powerful technology for improving the specificity of user prompts, which enhances the responses from AI chatboxes. RAG allows the integration of recent clinical data from clinical guidelines and other reliable sources for faster diagnoses and treatment recommendations (Wang et al., 2023). For example, GPT4-turbo has been used to support clinical decision-making in the management of bipolar depression (Perlis et al., 2023). The researchers combined the standard LLM architecture and RAG as a strategy for incorporating evidence-based recommendations in clinical practice. The study established the feasibility of incorporating clinical knowledge in LLMs to facilitate decision-making.
Microsoft Co-pilot is an AI tool that can be used to enhance clinical administration and improve the efficiency and productivity of healthcare professionals (Bitran, 2024). Generative AI models or LLMs can generate content from a user prompt, but they are limited to generic responses.
Figure 2. Integrating Microsoft Co-Pilot with RAG to add context to responses (Castelluccio, 2024).
The addition of RAG allows LLMs to overcome this problem, which enhances their applications in healthcare. An example is the use of Azure AI Studio, which allows developers to customize OpenAI and implement RAG patterns (Castelluccio, 2024). Microsoft Co-pilot also enables developers to create copilots and embed the capabilities.
Virtual Health Care
Virtual healthcare is part of the contemporary healthcare system. The application of mobile health (mHealth) and cloud computing is envisaged to transform healthcare. RAG has been augmented with LLMs for enhanced domain-specific interactions in healthcare, including medical diagnosis. Thompson et al. (2023) proposed the integration of LLMs with RAG for disease diagnosis. According to the researchers, disease phenotyping within electronic health records involves examining a patient’s clinical history. The traditional practice of encoding physician knowledge into rules is error-prone and tedious. LLMs offer a unique opportunity for automating this task but they are inefficient in handling clinical documentation. Thompson et al. (2023) propose the integration of LLM, RAG, and MapReduce to identify disease-related texts for framing queries and establishing accurate diagnoses. The architecture consists of three components: retrieval querying, and aggregation. Integrating RAG in the LLM architecture reduces the amount of text the model processes. The RAG-based architecture was evaluated using pulmonary hypertension data. The simulation results showed that harnessing RAG and MapReduce enhances the analysis of patient documentation.
Figure 3. Overview of the LLM-based RAG architecture (Thompson et al., 2023).
Patient Engagement and Personalized Patient Instructions
RAG can also help the federal government improve virtual healthcare and enhance patient experience in telehealth programs. Ayers et al. (2023) evaluated the ability to use ChatGPT to provide quality and empathetic responses to patient questions. It was hypothesized that the increase in patient messages can lead to more work and burnout for healthcare professionals. Yet, AI assistants could solve the problem by helping to answer patient questions. The evaluation showed that the chatbot responses were of better quality than the physician responses. It was concluded that chatboxes can generate quality and empathetic responses to patient queries in online forums. Integration of advanced AI assistants in clinical settings will help the government alleviate staffing problems, especially in virtual healthcare settings.
Role of RAG in Clinical Research
The federal government plays an overall role in overseeing medical research, regulating clinical trials, and ensuring the safety of participants while considering administrative procedures. RAG provides an opportunity for biomedical research pipelines to use LLM by augmenting recent and accurate model information. PaperQA is an example of a modular RAG-based agent designed for scientific research (Lala et al., 2023). The system incorporates three components that enable scientific question answering: finding relevant papers to a question, collecting text data from the documents, and generating answers with references.
Figure 4. An illustration of the workflow of PaperQA, a RAG-based solution for scientific research (Lala et al., 2023).
The key advantage of the system is its ability to use RAG tools and retrieve relevant full-text papers, which enhances the speed and reduces the cost of scientific research. This can be a useful tool in biomedical research.
Zakkla et al. (2023) proposed a similar RAG-based model for clinical medicine known as the Almanac. It consists of a database engine for content storage, a browser for fetching information from the Internet, a retriever for encoding queries and reference materials, and a language model for extracting relevant information from context.
Figure 5. Overview of Almanac, which uses external tools to retrieve relevant information and frame responses with references (Zakkla et al., 2023).
A key initiative of the U.S. federal government is the plan to improve transparency in clinical trials by expanding trial registries and promoting open data sharing. The U.S. Department of Health and Human Services (HHS) and the National Institutes of Health (NIH) intend to improve the integrity of medical research while addressing safety concerns. Since trial registries are web-based platforms, they must provide accurate information to the scientific community and the public. Generative AI and RAG in particular offer unique opportunities for these efforts. One area where RAG can offer benefits is subject screening during clinical trials (Unlu et al., 2024). Subject screening is a labor-intensive and error-prone task in virtually all clinical trials. The advent of LLMs and NLP offers advanced capabilities for improving the quality of clinical research. In one of the pioneering studies, Unlu et al. (2024) demonstrated the use of GPT-4 for clinical trial screening. The researchers used the language capabilities of the GPT-4 to access external data, such as clinical notes of patients. Then using the RAG architecture, they leveraged the clinical notes as an external data source to capture the most relevant contexts. The workflow using the RAG architecture consisted of four components: data load, data split, vector embeddings, and question answering (Unlu et al., 2024). In the first step, the user query was converted to vector embeddings, Then the most relevant chunks were retrieved before preparing the prompt with the question and the relevant content. Overall, the study demonstrated the advantages of combining GPT with the RAG architecture in terms of improving efficiency and reducing the costs in clinical trial recruitment. However, this approach has potential drawbacks, including ensuring that GPT obtains relevant clinical context as input to yield accurate responses. A potential improvement is to use low-cost techniques, such as metadata filtering to target specific clinical notes and refine searches. Vector databases like LangChain and LlamIndex can be useful in implementing the improvements.
Conclusion
AI-powered healthcare is one of the key innovations with the potential to revolutionize federal healthcare, from complex policy-making to administration, clinical support, and decision-making. Generative AI and LLMs have already proven useful in generating content with a prompt. However, these tools are limited to generic responses, which do not work well in dynamic real-world healthcare scenarios. The integration of RAG will unlock these limitations and pave the way for widespread applications in decision-support systems, virtual healthcare, and medical research, and streamline patient referrals and personalization.
References
Ayers, J. W., Poliak, A., Dredze, M., Leas, E. C., Zhu, Z., Kelley, J. B., Faix, J. D., Goodman, A. M., Longhurst, C. A., Hogarth, M., & Smith, D. M. (2023). Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Internal Medicine, 183(6), 589-596. http://dx.doi.org/10.1001/jamainternmed.2023.1838
Bitran, H. (2024, March 11). Azure AI Health Bot helps create copilot experiences with healthcare safeguards. https://azure.microsoft.com/en-us/blog/azure-ai-health-bot-helps-create-copilot-experiences-with-healthcare-safeguards/
Castelluccio, C. (2024, January 15). Building your own copilot – yes, but how? (Part 1 of 2). https://techcommunity.microsoft.com/t5/educator-developer-blog/building-your-own-copilot-yes-but-how-part-1-of-2/ba-p/4029571
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Sun, J., Guo, M. & Wang, H. (2024). Retrieval-augmented generation for large language models: A review. arXiv:2312.10997v4 [cs.CL] 5 Jan 2024.
Lala, J., O’Donoghue, O., Shtedristski, A., Cox, S., Rodrigues, S. G., & White, A. D. (2023). PaperQA: Retrieval-augmented generative agent for scientific research. arXiv:2312.07559v2 [cs.CL] 14 Dec 2023.
Perlis, R. H., Goldberg, J. F., Ostacher, M. J., & Schneck, C. D. (2024). Clinical decision support for bipolar depression using large language models. Neuropsychopharmacology, 1-5. https://doi.org/10.1038/s41386-024-01841-2
Thompson, W. E., Vidmar, D. M., de Freitas, Pfeifer, J. M., Fornwalt, K. B., Chen, R., Altay, G., Manghnani, Nelsen, A., C., Morland, K., Stumpe, M. C., & Miotto, R. (2023). Large Language Models with Retrieval-Augmented Generation for Zero-Shot Disease Phenotyping. In 1st Worksop on Deep Generative Models at NeuIPS. https://doi.org/10.48550/arXiv.2312.06457
Unlu, O., Shin, J., Mailly, C. J., Oates, M. F., Tucci, M. R., Varugheese, M., Wagholikar, K., Wang, F., Scririca, B. M., Blood, A. J., Aronson, S. J. (2024). Retrieval augmented generation enabled generative pre-trained transformer 4 (GPT-4) performance for clinical trial screening. MedRxiv, 1-22. https://doi.org/10.1101%2F2024.02.08.24302376
Wang, C., Ong, J., Wang, C., Ong, H., Cheng, R., & Ong, D. (2023). Potential for GPT technology to optimize future clinical decision-making using retrieval-augmented generation. Annals of Biomedical Engineering. https://doi.org/10.1007/s10439-023-03327-6. Abstract.
Zakka, C., Chaurasia, A., Shad, R., Dalal, A. R., Kim, J. L., Ashley, E., Boyd, J., Boyd, K., Hisrch, K., Langlotz, C., Nelson, J. & Heisinger, W. (2023). Almanac: Retrieval-augmented language 2 models for clinical medicine. NEJMA AI, 1(2), 1-23. https://orcid.org/0000-0002-3548-2578
Zhang, P., & Boulos, M. N. K. (2023). Generative AI in medicine and healthcare: promises, opportunities and challenges. Future Internet, 1-15. https://doi.org/10.3390/fi15090286