Your Data Sucks, and Now Your AI/Chatbot Does Too…
Introduction
I attended a conference and sat at a shared table, two men complained about the problems they had with their AI projects and the poor outcomes. They saw "Microsoft" on my badge and immediately felt they could blamed me for their issues.
While asking them some basic questions about their engagement, I noticed that they ignored all the IT 101 principles. AI is not magic and it does not excuse you from following simple IT project management processes. You still need to care about data quality, velocity, testing, and goal setting. This article reminds you that AI does not change the rules of physics or the best practices of IT governance and common sense.
Your Data Sucks, and Now Your AI/Chatbot Does Too…
The modern business environment depends significantly on technological innovation to create a competitive advantage and exploit novel opportunities. Emerging artificial intelligence (AI) and machine learning (ML) solutions have rapidly revolutionized and advanced critical industrial sectors such as healthcare, education, manufacturing, transportation, and agriculture. AI offers diverse tools that solve problems, streamline processes, create value for organizations, and automate routine tasks. These outcomes have contributed to improved efficiency, safety, and productivity. Scholarly interest in AI and ML has resulted in cutting-edge applications influencing daily life. For instance, individuals increasingly rely on AI-based personal assistants such as Amazon Alexa and Apple Sir. E-commerce, streaming, and social media platforms also use AI and ML algorithms to curate personalized experiences and provide recommendations based on interests. However, despite these benefits, data quality remains crucial for developing accurate AI models. Actual world implementations require high precision to minimize errors and potential harm. As such, data remains the critical foundation for practical AI innovations. A hospital with poor-quality data that attempts to leverage a large language model (LLM) using retrieval augmented generation (RAG) to develop a chatbot will experience significant challenges. Some problems arising from such chatbots include inaccurate responses, bias, response variance, and poor organizational performance.
Background: Chatbots in Healthcare
AI-based conversational chatbots have become a popular choice for organizations looking to provide reliable, convenient, proactive, highly available, and fast customer support. Valtolina et al. linked the success of chatbot applications to the evolving relationship between customers and organizations (3). Companies seek to improve the user experience through interactive systems. According to Reis et al., chatbots have steadily evolved from rudimentary solutions built on decision trees to sophisticated tools that use natural language processing (NLP) and pattern recognition (1). Therefore, the author defined modern chatbots as AI applications that facilitate natural communication between humans and computers (Reis et al. 3). Hospitals continue leveraging this innovation by creating domain-specific chatbots to respond to customer queries and provide patient-centered care. In healthcare, facilities use chatbots to facilitate patient education and disseminate crucial information to providers. For example, Microsoft has implemented DAX Copilot, which automatically creates physician draft summaries for exam rooms and telehealth conversations. The co-pilot-powered chatbot allows the provider to review the summary and make electronic health record (EHR) entries. Healthcare organizations can implement two types of chatbots, retrieval-based or generative, based on the conversational framework selected. The subsequent sections will explore how each approach handles data.
Retrieval-Based Chatbot. Pandey and Srishti characterized retrieval-based chatbots as models that utilize a list of pre-defined answers to determine the most appropriate response to a prompt while considering the input prompt and the prevailing context (2). Developers build intricate ML frameworks and rules for evaluating the user’s input before selecting the best response from a pre-established list. These models track conversations using a dialogue management framework. Pandey and Srishti indicated that retrieval-based chatbots have high accuracy rates when trained on large, sufficiently annotated, and labeled datasets (2). However, these chatbots lack flexibility and generate stiff responses that do not sound “human-like.”
Generative-Based Chatbot. In contrast, generative-based chatbots create new text following each query, which enables automatic conversations. Generative models learn from assessing trends in large training data volumes to develop original ideas (Pandey and Srishti 2). Generative models leverage ML techniques to learn how to respond and produce novel responses for each prompt. As such, these models can be used to create highly sophisticated solutions. The Chat Generative Pretrained Transformer (ChatGPT) developed by OpenAI provides an excellent example of how it has changed the technology landscape and introduced novel computing capabilities. According to Loh, ChatGPT and other similar tools can generate fluent and well-written text that is indistinguishable from content developed by humans (1). These chatbots also face challenges since they may produce output that contains mistakes.
Data as a Key Foundation to AI and The Role of Health Data in Chatbots
Data is a crucial foundation for safe and reliable AI. In the modern economy, IBM argued that most enterprises depend on data-driven decision-making, from product development to supply chain management (1). In the AI world, models trained on poor data will result in bad business decisions (IBM 1). Ehrlinger and Wolfram linked high data quality to interpretable and trustworthy analytics (1). Therefore, failure to effectively manage data could result in significant legal, financial, and ethical risks. These concerns are becoming more pronounced as organizational leaders position their firms to embrace the benefits enabled by AI. IBM indicated that the safety and effectiveness of AI technology and the tools it creates depend significantly on the training data and the information the model continues to learn from (1). Organizations that lack quality data cannot verify the quality of AI results. Ehrlinger and Wolfram demonstrated that poor-quality data increases the error rate of ML models (1). In the healthcare sector, inaccuracies could lead to medical errors, lengthen hospital stays, decrease patient satisfaction, and raise service costs. IBM recommends following a rigorous process to develop a value-driven approach to analyzing data quality. This approach will ensure firms implement IT solutions that enhance operations.
Healthcare organizations are implementing health information management systems (HIMS) at an unprecedented rate. These systems capture patient data, monitor provider performance, and guide decision-making. Reis et al. linked the growing costs of care and service demand to the observed trends in health information system deployment (4). Institutions implement advanced technologies to bridge care gaps, address disparities, minimize costs, and optimize operations. For instance, Pandey and Srishti asserted that digital technologies create an opportunity for accessible mental health services that minimize stigma and provide specialized care. The shift to the patient-centered perspective has also pushed institutional demand for IT since the approach views patients as active consumers of health information (Reis et al. 4). These transformations have resulted in facilities generating significant data volumes comprising structured and unstructured records, leading to the emergence of big data within health care. This data can provide essential insights into disease patterns and treatment efficacy. However, according to Batko and Andrzej, big data cannot be analyzed and processed using conventional techniques. Taleb et al. indicated that extracting insights from big datasets is not easy (3). The process requires specialized models, algorithms, and mining techniques to extract value.
Implementing chatbots presents an AI-based solution for analyzing such large health databases before disseminating information to patients, providers, administrators, and policy-makers. As such, data from health information systems serves an essential role in chatbot development since these solutions learn from diverse information. According to Pandey and Srishti, chatbot developers use numerous data collection techniques, including observation, research studies, experiments, statistics, and case studies. Data injection serves as the basis of chatbots. According to Jin, contextual data enables LLMs to transition from general-purpose to domain-specific knowledge. As such, the data ingestion cycle provides the entry point at which organizations gather, preprocess, and transform data into a format compatible with the LLM. Goodman et al. shared similar insights by indicating that LLMs learn by ingesting volumes of unannotated records using self-supervised approaches (2). Developers then fine-tune and enhance the model’s performance using a smaller annotated dataset (Goodman et al. 2). Understanding the data quality issues that might impact chatbot performance and the associated risks is essential.
Data Quality Metrics
Researchers have developed different metrics and dimensions for evaluating data quality with varying definitions. IBM identified the following measures: accuracy, completeness, consistency, timeliness, uniqueness, validity, and provenance (1). Based on the article, accuracy evaluates data quality by determining whether the data used to develop the model demotes real-life subjects (IBM 1). Ehrlinger and Wolfram defined accuracy evaluates the magnitude of error between the modeled data and the physical world (5). Inaccurate health records could significantly impact accuracy. Completeness assesses whether the data is whole and inclusive. Ehrlinger and Wolfram defined completeness as the breadth, depth, and scope of information and elements within a dataset (5). Complete data does not have missing values of the field.
Third, Consistency focuses on the integrity of records stored across distributed locations. According to IBM, consistent data has the same formatting and values across different systems and networks (2). Ehrlinger and Wolfram indicated that consistency monitors semantic rule violations within the data or storage structure (6). Timeliness examines the currency (update frequency) and volatility (how fast it becomes irrelevant) of the data for the application. According to IBM, organizations evaluate timeliness by determining the gap between data generation and utility. Decision-makers should ensure that the time difference does not adversely impact accuracy. Uniqueness evaluates whether the training data contains duplicated or overlapping records. Additionally, quality data must align with the intended purpose and established objectives. This disposition implies that quality depends on the context and use case application. Syed et al. referred to this phenomenon as contextual validity, which evaluates fitness for use and relevance. The authors asserted that meaningful data must contain relevant information that can answer the prevailing question. According to IBM failure to manage data quality could result in legal, ethical, and reputational risks (1). Data quality also limits an organization’s ability to leverage innovative solutions.
Data Quality Issues Impacting Chatbots in Healthcare
The healthcare sector presents significant data quality issues that affect Chatbot development. These issues limit technology advancement and real-life applications due to the adverse impacts posed. This section examines data quality issues discussed in the literature.
Data Privacy Concerns. First, the healthcare sector has privacy and confidentiality concerns when dealing with patient data. Service providers store and transmit legally protected personal information. Loh et al. identified AI as a potential risk to privacy because chatbots built on LLM architectures learn by systematically searching and ingesting information from public and private databases. Furthermore, providing an AI solution access to confidential patient data without consent constitutes a privacy breach. According to Reis et al., the privacy and security of health information remain a critical issue when implementing IT solutions. Despite technology directly benefiting patients, privacy still presents an ethical dilemma. Organizations may choose to overcome this limitation by de-identifying records, which might violate accuracy, completeness, and consistency requirements.
Missing Data. EHRs contain missing data. Syed et al. indicated that this was a common problem affecting information systems. Data fragmentation across networks and dissimilar systems also affects data completeness. Training AI on data that contains missing values could result in erroneous and biased responses.
Inconsistent and Fragmented Records. The healthcare setting also experiences inconsistent record capturing and storage. Healthcare data is highly fragmented and stored across multiple systems in different formats. Syed et al. attributed data inconsistency in health care to manual data entry and multi-disciplinary teams with varying needs. The researchers also identified temporary variability as a source of inconsistent records. Changes in medical practices and policies occasion this inconsistency. Semantic inconsistencies across systems also arise because of logical contradictions (Syed et al.). This quality issue affects the consistency measure discussed above.
Standardization of Terms. The healthcare sector faces data quality issues resulting from unstandardized terminology. Syed et al. noted that health information systems handle data in different ways. Workflows also vary from one clinical setting to another. Chatbots experience portability challenges owing to poor standardization practices.
Risks of Poor Data Quality for a Health Service Chatbot Application
Poor quality data presents significant risks for an LLM-based chatbot that generates health information. According to Syed et al., poor data quality affects the patient care continuum, treatment safety, research findings, and provider efficiency. This section highlights some of the hazards of poor data quality within the healthcare industry.
Declined Organizational Performance. Poor data quality affects organizational performance. Healthcare organizations use chatbots to provide physicians and patients with timely information for adequate decision-making. Poorly designed chatbots will lead to incorrect decisions that impact business performance.
Misinformation. A chatbot trained on poor-quality data will produce imprecise information. Jin indicated that inaccurate and erroneous data may introduce misinformation and incorrect answers. In the healthcare setting, inaccuracies could adversely affect patient safety while increasing treatment risks. Incomplete or inaccurate answers adversely impact health and life (Reis, Lea, et al. 1). These impacts directly affect treatment outcomes and patient satisfaction.
Response Variance. Data that contains outlier values may increase the variance of the information generated by the LLM. According to Jin, the model becomes inconsistent and poorly fits real-world healthcare applications. Response variance may result in the same patient receiving different information from the chatbot, which may cause confusion or contribute to medical errors.
Bias. Data quality issues introduce the risk of bias in the generated text. Loh et al. noted that chatbots suffer from the same vulnerability for bias as other predictive models since LLM outcomes are influenced by the training data (2). Having poor-quality data with missing elements or fields may augment AI responses.
Controlling Data Quality
Organizations can adopt different approaches for managing and controlling data quality. Jin recommended implementing holistic quality and governance policies. According to the author, governance is a continuous process that allows firms to analyze alignment with legal requirements and industry best practices. Secondly, data integration will allow organizations to defragment data stored within diverse sources. Data cleaning and preprocessing stages are crucial for ensuring the data meets the formatting requirements for the LLM architecture. This phase involves applying the essential transformation, removing duplicates, addressing missing values, and selecting appropriate data types.
Microsoft recommends data tiering methods for quality control management. The approach focuses on establishing three data ranks: bronze, silver, and gold. The bronze phase comprises raw and untransformed datasets; silver contains cleansed and semi-processed datasets, while gold stores highly processed, optimized, and transformed datasets. Microsoft also recommends a validation stage between the bronze and silver data lakes. Validation ensures that all data conforms with established governance rules and policies.
Conclusion
Emerging AI applications are rapidly advancing healthcare. Chatbots are playing an instrumental role in transforming the healthcare industry. Chatbots provide highly available and automated customer service and resource management tools. The applications have helped improve access to health information and crucial services. Research evidence indicates that an acute shortage of skilled labor faces the healthcare sector. This disposition implies that automation ensures effective resource utilization. Chatbots also improve health literacy and enhance provider performance. Healthcare organizations can implement two types of chatbots: retrieval-based and generative. The former utilizes a list of pre-defined answers to determine the most appropriate response to a prompt. In contrast, generative-based chatbots create new text following each query, which enables automatic conversations. Despite these advantages, a hospital with poor-quality data that attempts to leverage a chatbot will experience significant challenges. Some of the problems arising from such chatbots include inaccurate responses, bias, response variance, and poor organizational performance. Some of the data quality issues that affect accuracy, completeness, consistency, timeliness, uniqueness, and validity include privacy concerns, missing data, inconsistent data, fragmented records, and poor standardization of terms across healthcare systems. Healthcare organizations can adopt best practices focused on enhancing data governance, integration, preprocessing, and tiering to improve data quality.
Works Cited
Batko, Kornelia, and Andrzej Ślęzak. "The use of Big Data Analytics in healthcare." Journal of Big Data 9.1 (2022): 1-24.
Cattaneo, L., et al. "On the role of Data Quality in AI-based Prognostics and Health Management." IFAC-PapersOnLine 55.19 (2022): 61–66.
Ehrlinger, Lisa, and Wolfram Wöß. "A survey of data quality measurement and monitoring tools." Frontiers in big data 5 (2022): 1-30.
Goodman, Rachel, et al. "Accuracy and reliability of chatbot responses to physician questions." JAMA Network open 6.10 (2023): e2336483-e2336483.
IBM. “Data Quality, AI Performance and Trust.” Data and Trust Alliance, n.d., www.ibm.com/downloads/cas/WYMOR8E5. Accessed 15 March 2024. Accessed 16 March 2024.
Jin, Sophie. “The importance of data ingestion and integration for enterprise AI.” IBM., 2024, www.ibm.com/blog/the-importance-of-data-ingestion-and-integration-for-enterprise-ai/. Accessed 16 March 2024.
Loh, Erwin. "ChatGPT and generative AI chatbots: challenges and opportunities for science, medicine and medical leaders." BMJ leader (2023): 1-4.
Microsoft Corporation. “Data and DataOps Fundamentals.” Microsoft Solutions Playbook. playbook.microsoft.com/code-with-engineering/design/design-patterns/data-heavy-design-guidance/?h=data. Accessed 16 March 2024.
Microsoft. “Microsoft introduces new data and AI solutions to help healthcare organizations.” Microsoft, 2024, news.microsoft.com/october-2023-healthcare-news/. Accessed 16 March 2024.
Pandey, Sumit, and Srishti Sharma. "A comparative study of retrieval-based and generative-based chatbots using Deep Learning and Machine Learning." Healthcare Analytics 3 (2023): 1-12.
Reis, Lea, et al. "Chatbots in healthcare: Status quo, application scenarios for physicians and patients and future directions." (2020): 1-16.
Syed, Rehan, et al. “Digital Health Data Quality Issues: Systematic Review.” Journal of Medical Internet Research. 25 (2023):
Taleb, Ikbal, et al. "Big data quality framework: a holistic approach to continuous quality management." Journal of Big Data 8.1 (2021): 1-10.
Valtolina, Stefano, Barbara Rita Barricelli, and Serena Di Gaetano. "Communicability of traditional interfaces VS chatbots in healthcare and smart home domains." Behaviour & Information Technology 39.1 (2020): 108-132.