The Advantage of Multimodal AI In Healthcare

 

The Advantage of Multimodal AI In Healthcare

Innovative digital solutions have transformed the healthcare sector by supporting provider collaboration, reducing medical errors, enabling telemedicine, and facilitating informed decision-making. Currently, interconnected health information management systems help streamline care coordination and ease communication across multi-disciplinary teams. Technology empowers patients and primary caregivers through timely access to crucial health information and educational programs promoting treatment or medication adherence. Despite these advances, human health issues remain complex and multifaceted with rapidly evolving needs. Health facilities collect and process pertinent data from diverse sources, including electronic health records (EHR), wearable devices, payment processors, public exchange marketplaces, laboratory results, clinical observations, and medical imaging.

The healthcare system simultaneously tracks personal and population health metrics to ensure targeted service delivery. For instance, healthcare organizations will monitor treatment outcomes to ensure safe and cost-effective patient care while examining community-level socioeconomic barriers to identify prevailing needs. Based on these assertions, facilities maintain fragmented databases containing highly dimensional structured and unstructured records that present significant processing and management challenges. Emerging artificial intelligence (AI) technologies offer powerful tools for providers to draw actionable insights in real time. This paper will explore the advantages of a new AI paradigm, multimodal AI. The discussion hypothesizes that adopting technologies such as Microsoft CODI and AutoGen improves the accuracy of healthcare outcomes.

Background and Overview: Multimodal AI

Multimodal AI in Healthcare

Multimodal AI is a developing architecture combining various data types and intelligent processing algorithms or agents to achieve higher performance. According to Rao (2024), multimodal AI advances unimodal learning models that process and extract insights from a single data modality. Acosta et al. (2022) indicated that most current AI healthcare applications address tasks requiring a single data modality, such as analyzing computed tomography (CT) scans to summarize diagnostic findings. However, providers rely on holistic information in the clinical setting to make informed treatment decisions and ensure care continuity. Cai et al. (2019) agreed that patient treatment approaches supported by a single data source cannot cope with the rapidly evolving care needs within vulnerable populations. Therefore, single-modality AI has limitations in the real world, where multiple data sources interact and co-exist. Tang et al. (2023) added that chaining modality-specific models is challenging and slows processing. This approach can introduce consistency, synchronization, and alignment issues for post-processed outputs from multiple data streams. Multimodal AI overcomes these challenges by efficiently accepting different data format inputs and performing broader tasks. Multimodal AI also addresses issues associated with traditional computer-aided medical expert systems that cannot dynamically update the underlying processing semantics using emerging knowledge (Cai et al., 2019). As such, multimodal AI

Multimodal AI Categories in Healthcare

Three multimodal AI categories have been discussed in the literature. According to Cai et al. (2019), the first group comprises intelligent tools offering accurate diagnostic guides and treatment recommendations for patient populations. These systems integrate internal hospital data and external information from government agencies, Internet databases, and community resources to address health challenges. The second classification includes multimodal systems that fuse data from different departments and institutions (Cai et al., 2019). Such implementations leverage population-level data to enhance treatment accuracy and ensure excellent patient outcomes. The last category comprises dynamic and adaptive multimodal AI built on a panoramic cross-border data fusion architecture. Cai et al. (2019) asserted that this approach is ideal for real-world applications since it appreciates differences in health data among service providers. Therefore, the system adapts to match service preferences and available resources.

Multimodal AI Tools: Microsoft CODI and AutoGen

            Healthcare organizations can leverage Microsoft Composable Diffusion (CoDi) and AutoGen to implement multimodal AI to enhance patient outcomes and organizational performance. Tang et al. (2023) characterized CoDi as the first generative AI model that simultaneously processes and generates high-quality output from multiple modalities (see Figure 1). Limitations in applying unimodal streams in the real world influenced CoDi’s development. The solution relies on an innovative composable generation technique establishing a shared semantic space for the different data modalities (Tang et al., 2023). It also addresses gaps in the content alignment during the diffusion process by incorporating a cross-attention module for each diffuser and an environmental encoder. This capability allows CoDi to produce synchronized content containing any-to-any data combinations from video, images, audio, and text. For instance, according to Tang et al. (2023), the technology can produce temporary aligned video and audio.  Figure 1 below illustrates how CoDi accepts text prompts, audio, and video inputs to generate a mixture of output formats. The healthcare sector can use such tools to generate accurate patient and provider educational material based on multiple data streams.

Figure 1. Graphical illustration of CoDi’s capabilities      

Source: (Tang et al., 2023)

            AutoGen is a powerful multimodal AI tool that applies to healthcare. According to AutoGen (2024), this open-source solution leverages multiple agents to streamline dynamic and complex systems. Agents include entities that can accept inputs and generate outputs based on underlying algorithmic architectures that model the real world (AutoGen, 2024). AutoGen abstracts agents and enables collaboration, easing the implementation of complex systems. Moreover, AutoGen enables highly customizable and modular implementations. For instance, the innovation allows users to extend agent capabilities with additional elements. Organizations can also combine powerful and sophisticated agents to address typical data processing challenges facing traditional IT. Figure 2 provides a visual example of the built-in ConversableAgent in AutoGen. The technology supports crucial components, including a large language model (LLM), a code executor, a function executor, and a human integration aspect. Users can customize these elements as desired or based on the desired system function. AutoGen also allows developers to assign roles and responsibilities to agents, allowing them to exchange information and insights autonomously.     

Figure 2. Block diagram of the built-in ConversableAgent in AutoGen     

Source: (AutoGen, 2024).     

Data Modalities in Healthcare

Complexities in health service delivery have led to the rise and emergence of data modalities within care settings. Cai et al. (2019) attributed the increase in multimodal health information to digitization trends within the sector, leading to healthcare big data. The authors stated that generated data include real-time health precision information, medical research data, and information from social networks. Leveraging these information sources allows healthcare organizations to develop and tailor efficient and accurate services. Cai et al. (2019) also noted significant differences in the medical data sources due to the diversity of the generation and collection methods adopted. For instance, the article highlighted that text-based data from the internet exhibits discrete characteristics while imaging data from within facilities follows a natural random distribution. Therefore, developing multimodal data analysis techniques is essential for knowledge fusion and holistic decision-making.    

Data modalities in healthcare also arise from diverse information sources. Shaik et al. (2023) identified crucial health data sources, including EHRs, medical images, wearable technology, sensors, and outcomes from environmental analysis, genomic sequencing, and community or behavioral measures (see Figure 1). According to the authors, these modalities contain raw unstructured data in different formats. Acosta et al. (2022) attributed data modalities in healthcare to the widespread adoption of bio-sensors and expanded data capture techniques. Multi-modal AI uses different techniques to transform data into meaningful information by categorizing medical images and extracting features from sensor and wearable device data.

Figure 3. Data modalities in smart healthcare delivery

Source: (Shaik et al., 2023)

Microsoft CoDi and AutoGen can help facilities address the data modalities issues in health information systems. CoDi can extract data from EHR, a centralized repository for pertinent patient and treatment data comprising prescribed medication, patient history, laboratory and imaging results, clinical notes, and other measurements. This data can be combined with external information from the Internet to produce targeted health education programs to improve treatment self-efficacy. The multi-agent conversation framework in AutoGen introduces new possibilities in health management. According to AutoGen (2024), this unified framework automates chats and messages between agents with varying capabilities. This allows the agents to perform tasks autonomously or with human feedback. These tools can build AI assistants for physicians and develop reference manuals based on clinical practice guidelines for diagnostic purposes.      

Advantages of Multimodal AI in Healthcare

            Multimodal AI solutions such as Microsoft CoDi and AutoGen significantly benefit healthcare institutions and improve patient outcomes. This provides an in-depth analysis of each advantage.    

Empowers Precision Medicine

Multimodal AI enables precision medicine by facilitating patient-centered care and personalized therapies. Kosorok and Laber (2019) defined precision medicine as an approach focused on improving healthcare quality through individualized processes aligned with unique patient needs. The approach relies on formalized treatment techniques based on updated patient information to make appropriate recommendations regarding prescribed medication, dosage, drug administration, and dietary or lifestyle changes. Kosorok and Laber (2019) noted that this framework encompasses broad areas, including genetic analysis, drug development, provider-patient communication, and evidence-based practice. According to Acosta et al. (2022), multimodal AI generates crucial personalized omics and markers that providers can analyze to understand heterogeneous conditions such as cancer. As such, multimodal AI will ensure successful patient outcomes in precision medicine by providing adequate clinical decision support and timely critical insights drawn from diverse data streams.

Enhanced Patient Monitoring Systems

Health facilities actively monitor patients to assess treatment outcomes and improve service delivery. For instance, Hernandez et al. (2021) stated the importance of monitoring post-operative cardiovascular patients since they may develop life-threatening complications such as respiratory failure and hypotension. However, according to the authors, this need has increased data modalities because most intensive care units deploy monitoring units and sensors that collect nearly 10,000 data points per second (Hernandez et al., 2021). This voluminous data limits real-time analysis and timely response. Multimodal AI can ingest data from monitoring devices, EHR, and patient history records. The technology can discover complex patterns in the patient’s physiological data that human providers may be unable to identify. These solutions also offer predictive analytics that may prove essential in improving postoperative care and reducing patient mortality.

Reduce Medical Errors and Adverse Outcomes

Multimodal AI also helps reduce medical errors and adverse outcomes. According to Pierre et al. (2023), these systems combine medical images, patient records, tissue analysis, and molecular profiling to recommend the best treatment approach tailored for an individual patient. Facilities can leverage these tools to develop evidence-based treatment reference manuals and AI assistants that offer providers recommendations based on presenting signs and symptoms. Pierre et al. (2023) noted that multimodal AI outputs can augment expert interpretations and opinions to enable safe care delivery. As observed, predictive capabilities built on multiple data streams can help prevent complications through appropriate and timely interventions.    

Streamline Healthcare Workflows

            Healthcare organizations have complex and fragmented workflows that vary from one department to another. For instance, textural data from EHR records has discrete characteristics, while imaging data from the radiology unit follows a random distribution (Cai et al., 2019). Implementing tools like Microsoft CoDi will streamline these dynamic systems by simultaneously processing these different inputs and producing coherent outputs. Facilities could leverage multimodal AI in resource planning and scheduling to ensure optimal utilization while freeing up crucial resources. Multimodal AI can fuse employee data and service demand measures to create ideal work schedules and balanced staffing ratios.   

Drug Discovery, Research, and Development

            Pharmacological approaches remain a popular treatment choice for many illnesses. Pharmaceutical companies can implement multimodal AI algorithms to support drug research, clinical trials, and medication development. Luo et al. (2024) indicated that traditional drug development procedures utilize knowledge of molecules and proteins from diverse sources, including chemical structures, knowledge bases, structured datasets, and unstructured literature evidence. Processing this information is time-consuming and costly. Vermeulen et al. (2022) estimate that most drug therapies take nearly 12 years from protein discovery to regulatory approvals. Multimodal AI presents viable opportunities for reducing these turnaround times and enhancing therapy efficacy. Vermeulen et al. (2022) recommended using multimodal imaging AI tools for analyzing molecular information effectively. This approach will facilitate informed decision-making, reduce costs, and improve safety by eliminating undesirable compounds.  

Improved Patient Diagnosis

Providers require holistic information to make accurate patient diagnoses. Mistakes could lead to incorrect treatment, adverse reactions, and higher mortality. According to Cui et al. (2023), a single patient visit may generate data and insights in different modalities, including images, laboratory test results, and clinical notes. This heterogeneous data offers a high-level overview that supports clinical decisions relating to disease diagnosis and prognosis (Cui et al., 2023). Providers may face significant challenges in efficiently analyzing this rich and complementary information, creating the need for multimodal AI solutions that can abstract and fuse complex associations within highly dimensional data. Tools such as Microsoft CoDi can capture these different data streams, extract relevant features, and fuse outputs. Implementing these technologies in the clinical setting will improve treatment outcomes and patient satisfaction.           

Enabling Remote Health Services

Multimodal AI is enabling remote health services delivered over digital frameworks. Technology has bridged and reduced geographical divides through interconnected networks and devices. Providers can rely on wearable devices, sensors, and patient-reported information data to deliver telehealth services. Multimodal AI tools will fuse insights from these sources and recommend interventions. Providers can also use these technologies to monitor outcomes for patients recovering from home remotely. Combining data from these sources will allow providers to assess medication adherence and the impacts of recommended lifestyle modifications. As such, multimodal AI will help improve access to healthcare by eliminating prevailing social and economic barriers – especially in underserved areas.     

Real-World Applications and Opportunities

            The advantages of multimodal AI lend to various real-world applications and opportunities. This section highlights essential technology uses that enhance healthcare services.   

Clinical Decision Support

Multimodal AI provides in-depth data-driven insights for informed decision-making. According to Chen et al. (2024), care providers can make decisions with higher accuracy and efficiency, which leads to improved overall quality of care and patient outcomes. Additionally, AI models can utilize multi-dimensional data to build predictive models for disease evaluation, helping healthcare professionals make better decisions about patient care plans and treatment options based on predicted disease trajectories (Chen et al., 2024). The authors further indicate that healthcare professionals deploy AI to help them make informed decisions regarding the most impactful research directions based on patient and community interests and challenges. Multimodal AI performs a more comprehensive data exploration to provide valuable insights that guide healthcare scholars and policymakers in improving care quality and delivery.

Remote Patient Monitoring 

AI enhances the opportunity to simulate the healthcare setting in an individual’s home using biosensors that facilitate continuous monitoring and analytics. Multimodal AI uses wearable sensors’ physiological metrics, ambient sensors, and EHR data to create a personalized remote monitoring experience. According to Acosta et al. (2022), multimodal remote patient monitoring showed feasibility and safety when applied to acute diseases such as COVID-19. Additionally, Topol (2023) indicates that applying multimodal AI for remote monitoring facilitates remote dispatching of medical personnel, reducing hospital admissions. This approach reduces healthcare costs and minimizes exposure to nosocomial infections and medical errors in hospital settings while enhancing comfort, emotional support, and convenience.

Diagnostics and Findings Generation

Multimodal AI leverages input from diverse sources, including medical images, clinical notes, and lab results, enhancing diagnostic accuracy, pattern identification, and correlations. According to Chen et al. (2024), AI facilitates enhanced data synthesis, providing healthcare professionals with a comprehensive outlook on the patient’s health status and ensuring more precise diagnostics. Chen et al. (2024) depict the growing interest in deploying AI-powered multi-task diagnostic and multimodal data fusion to predict neurodegenerative disease. This multimodal AI integrates medical imaging, clinical, sensor, genetic, and cognitive data to assess an individual's health status and disease risk, identifying complex patterns related to early-stage disease development.

Community Health Status Surveillance

The COVID-19 pandemic facilitated the implementation of community health status surveillance, revealing the impact of the multimodal AI model in pandemic preparedness and response. Acosta et al. (2020) add that national and state-level disease surveillance necessitates integrating multimodal data from movement maps, mobile phone use, and health delivery data to forecast disease outbreaks and detect latent cases. For instance, Acosta et al. (2020) depict the Digital Engagement and Tracking for Early Control and Treatment (DETECT) that analyzes data from wearables to facilitate rapid detection of fast-spreading viral illnesses. Topol (2023) also depicts individualized spatiotemporal, real-time risk assessment surveillance that uses geolocation, symptoms, vaccination status, wearable sensors, and other data layers. In this regard, multimodal AI models provide accurate analysis for enhanced health status detection and surveillance.

Digital Clinical Trials

Clinical trials in healthcare for identifying the efficacy of treatment interventions. Digital clinical trials integrate AI to facilitate data analysis from multiple sources obtained using digital products such as wearable technology and smartphone-enabled self-reported questionnaires. According to Acosta et al. (2022), applying AI addresses limitations, including planning, time, financial constraints, and underrepresentation due to geographic, sociocultural, and economic disparities associated with randomized clinical trials study design. The emergence of digital clinical trials will improve diagnostic, prognostic, and therapeutic intervention investigations by eliminating current RCT barriers. For instance, Google’s time-series analysis shows promise in achieving interpretable time-series forecasting using static and time-dependent inputs. In this regard, static inputs would be patients’ genetic background, while time-varying variables include the time of day and glycemic levels to predict the risk of hypo-/hyperglycemia (Acosta et al., 2022).

Virtual Health Assistants

Multimodal AI is applicable in data-driven applications, including virtual health assistants that help individuals at risk of developing chronic illnesses such as diabetes, obesity, or high blood pressure. According to Topol (2023), an AI-based virtual health assistant provides individuals with real-time feedback about their health data, which facilitates preventive measures and enhances self-management. This application interacts with users via text, voice, and images to provide personalized health information. Topol (2023) indicates that there are disease-specific virtual AI Chatbots. The virtual health assistant also analyzes and coaches individuals based on their physical activity, sleep, unstructured text from medical records, and the most recent medical literature.

Future Consideration: Ethical and Security Concerns

Despite multimodal AI's benefits and real-world opportunities in healthcare, ethical and data privacy concerns persist. Healthcare organizations generate and process protected health information. The privacy rule under the Health Insurance Portability and Accountability Act (HIPAA) establishes protection measures for maintaining health data confidentiality in the United States. The training and deployment of multimodal AI solutions could negatively impact data privacy, causing unauthorized disclosure. There is a growing need to depersonalize training data while balancing model performance to ensure alignment with regulatory requirements. Acosta et al. (2022) recommended various privacy protection techniques, including differential privacy, federated machine learning, and encryption. Differential privacy conceals personally identifiable information while maintaining higher-level population data. Federated learning allows healthcare organizations to train the AI model without sharing raw data (Acosta et al., 2022). The model is then hosted in a trusted central server, aggregating training outcomes from different organizations. Encryption obfuscates sensitive information and allows sharing without unauthorized exposure (Acosta et al., 2022). Further developments in these sectors will support  

Conclusion

In conclusion, technological developments enable safe and effective patient care within healthcare facilities. Health databases contain fragmented information from diverse sources, including electronic health records (EHR), wearable devices, payment processors, public exchange marketplaces, laboratory results, clinical observations, and medical imaging. These highly dimensional records present significant processing and management challenges that hamper smooth service delivery. Implementing multimodal AI solutions that combine various data types and multiple intelligent agents can lead to higher performance. These tools overcome the limitations associated with deploying unimodal learning models that extract insights from a single data modality because, in the clinical setting, multiple data sources interact and co-exist. As such, multimodal technologies enable providers to use holistic information to make informed treatment decisions and ensure care continuity. Microsoft Composable Diffusion (CoDi) and AutoGen have emerged as the leading multimodal AI tools. CoDi can simultaneously process and generate high-quality output from multiple modalities. In contrast, AutoGen leverages multiple agents to streamline dynamic and complex systems. The advantages associated with these tools include robust precision medicine, improved patient monitoring, medical error reduction, streamlined workflows, cost-effective drug development, better patient diagnosis, and enhanced healthcare access through remote care. Real-world applications for these multimodal tools include clinical decision support, medical imaging analysis, remote patient monitoring, diagnostics and findings generation, community health status surveillance, digital clinical trials, and virtual health assistants. These solutions interact with text, voice, and images to provide personalized health information.        


 

References

 

Acosta, J. N., Falcone, G. J., Rajpurkar, P., & Topol, E. J. (2022). Multimodal biomedical AI. Nature Medicine28(9), 1773-1784. https://doi.org/10.1038/s41591-022-01981-2

Agboola, A. (2024). The future of AI: Multimodal AI models. Medium. https://medium.com/@alexagboolacodes/the-future-of-ai-multimodal-ai-models-605b6f8eb009.

AutoGen. (2024). Multi-agent conversation framework. GitHub. https://microsoft.github.io/autogen/docs/tutorial/introduction.

Cai, Q., Wang, H., Li, Z., & Liu, X. (2019). A survey on multimodal data-driven smart healthcare systems: approaches and applications. IEEE Access7(1), 133583-133599. https://doi.org/10.1109/ACCESS.2019.2941419

Chen, X., Xie, H., Tao, X., Wang, F. L., Leng, M., & Lei, B. (2024). Artificial intelligence and multimodal data fusion for smart healthcare: topic modeling and bibliometrics. Artificial Intelligence Review57(4), 91.

Cui, C., Yang, H., Wang, Y., Zhao, S., Asad, Z., Coburn, L. A.,  & Huo, Y. (2023). Deep multimodal fusion of image and non-image data in disease diagnosis and prognosis: a review. Progress in Biomedical Engineering5(2), 022001. https://doi.org/10.1088/2516-1091/acc2fe

Hernandez, L., Kim, R., Tokcan, N., Derksen, H., Biesterveld, B. E., Croteau, A., & Gryak, J. (2021). Multimodal tensor-based method for integrative and continuous patient monitoring during postoperative cardiac care. Artificial Intelligence in Medicine113(1), 102032. https://doi.org/10.1016/j.artmed.2021.102032

Kline, A., Wang, H., Li, Y., Dennis, S., Hutch, M., Xu, Z., ... & Luo, Y. (2022). Multimodal machine learning in precision health: A scoping review. npj Digital Medicine5(1), 171.

Kosorok, M. R., & Laber, E. B. (2019). Precision medicine. Annual review of statistics and its application6(1), 263-286. https://doi.org/10.1146/annurev-statistics-030718-105251

Luo, Y., Liu, X. Y., Yang, K., Huang, K., Hong, M., Zhang, J., ... & Nie, Z. (2024). Toward Unified AI Drug Discovery with Multimodal Knowledge. Health Data Science4 (113), https://doi.org/1-10. 10.34133/hds.0113

Pierre, K., Gupta, M., Raviprasad, A., Sadat Razavi, S. M., Patel, A., Peters, K., & Forghani, R. (2023). Medical imaging and multimodal artificial intelligence models for streamlining and enhancing cancer care: opportunities and challenges. Expert Review of Anticancer Therapy23(12), 1265-1279. https://doi.org/10.1080/14737140.2023.2286001

Rao, D. (2024). The future of healthcare using multimodal AI-technology that can read, see, hear and sense. Oral Oncology Reports, 10(100340), 1-2. https://doi.org/10.1016/j.oor.2024.100340

Shaik, T., Tao, X., Li, L., Xie, H., & Velásquez, J. D. (2023). A survey of multimodal information fusion for smart healthcare: Mapping the journey from data to wisdom. Information Fusion, 102 (102040), 1-18. https://doi.org/10.1016/j.inffus.2023.102040

Tang, Z., Yang, Z., Zhu, C., Zeng, M. & Bansal, M. (2023). Breaking cross-modal boundaries in multimodal AI: Introducing CoDi, composable diffusion for any-to-any generation. Microsoft Research Blog. https://www.microsoft.com/en-us/research/blog/breaking-cross-modal-boundaries-in-multimodal-ai-introducing-codi-composable-diffusion-for-any-to-any-generation/

Topol, E. J. (2023). As artificial intelligence goes multimodal, medical applications multiply. Science381(6663), eadk6139.

Vermeulen, I., Isin, E. M., Barton, P., Cillero-Pastor, B., & Heeren, R. M. (2022). Multimodal molecular imaging in drug discovery and development. Drug Discovery Today, 27(8), 2086-2099. https://doi.org/10.1016/j.drudis.2022.04.009

Previous
Previous

Importance of Including Community Organization in Artificial Intelligence (AI) Healthcare Initiatives

Next
Next

Core Risk and Harms Created by AI Systems Due to Bias