Healthcare Data Deserts and How the Underserved Are Not Benefiting from Big Data and AI
Abstract
Data deserts are among the most impactful issues affecting modern healthcare, which is overly reliant on data and artificial intelligence (AI). Data deserts and the underlying disparities are observable in electronic health records (EHRs), patient-generated health data (PGHD), prescription drug monitoring, public health, and health research. This report examines these data deserts and the underlying implications on affected communities to frame potential solutions. The compiled information suggests that information technology (IT) can be leveraged to solve the data divide.
Introduction
The world is in the information age, where data is integral to better organization and new technologies and developments. In principle, the more information we have and can analyze, the better we can organize ourselves to deliver the best outcomes. Big data and AI are arguably the most important underlying concepts relevant to the healthcare sector. For example, big data and the associated AI-based analytics are crucial in medical research and delivering optimal healthcare services, helping reduce the incidence, severity, and burden of maladies in society. Big data and AI applications can be generalized as predicting disease occurrence, preventive healthcare, patient monitoring, clinical trials for drug development, and quality electronic health records (Rauch, 2024). For instance, AI algorithms can rapidly measure the amount of plaque from computed tomography angiography (CTA) images of the heart and arteries to predict heart attack and stroke (Rauch, 2024).
The increasing dependence on data and AI exacerbates the risk of unequal access to healthcare. Some individuals or groups may lack access to the requisite technologies or the capability to get data into systems rapidly. Such underserved populations or regions can be regarded as healthcare data deserts. The affected communities mainly comprise racial and gender minorities, including Hispanics, African Americans, and American Indians. Many Whites (non-Hispanics) are also affected, with approximately 16.7 million individuals living below the poverty line. The disparities are most prevalent for foreign-born Americans. Specifically, approximately 18.8% of individuals with no citizenship and 9.4% of naturalized citizens live in poverty. The disadvantaged communities are also more likely to include adults and people with disabilities. The disparities affecting healthcare data deserts negatively impact the standard determinants of health (SDOH), resulting in relatively high exposure to disease and suffering.
Examples of Data Deserts
The healthcare sector generally lags behind other industries in updating technologies for the digital world. Elaborate healthcare data infrastructure, including AI, is needed to extract information from registries, EHRs, administrative data, and patient-owned wearable devices. Expansive data production and collection also optimizes the understanding of biological, environmental, and social determinants of health, indicating the vitality of healthcare data for assessing the efficacy of new medical devices and treatments. The existing data deserts can be categorized based on these requirements as EHRs, PGHD, prescription drug monitoring, public health, and health research.
Electronic Health Records
EHRs are arguably the most critical type of data in the healthcare sector. Researchers estimate that 90% of nonfederal acute care hospitals in the United States employed certified EHR technology as of January 2022 (Diebold, 2022). However, only 55% of the hospitals used EHRs to share patient data, while 73% experienced critical challenges sharing patient data across different EHR systems (Diebold, 2022). This scenario translates to a significant data desert affecting the entire country since the hospitals cannot rapidly utilize EHRs for better patient outcomes. Mack et al. (2016) also report EHR disparities affecting underserved communities. Specifically, only 53% of rural medical facilities have active EHR systems. The adoption rate is also relatively low for Medicaid-predominant providers (68.9%). These statistics suggest that the data deserts related to EHRs primarily affect rural populations and individuals depending on subsidized medical care.
Patient-Generated Health Data
EHRs also create disparities related to PGHD. Care providers regard PGHD as crucial to patient-centered and whole-person health perspectives, primarily because data is generated in real time and used for better diagnosing, monitoring, and individual advocacy. PGHD is often sourced from wearable devices owned by patients. For instance, individuals with type 1 diabetes may wear smart devices to monitor their blood glucose levels. A care provider can leverage data from these devices to track patients’ health effectively. However, such devices are not universally available. SDOH, such as income and usual source of care, and barriers like safety issues, functionality, and confidence adversely impact the adoption of wearable devices and the associated data sharing. For example, people below the poverty line may be unable to afford the devices, leaving them in a severe data desert. Thus, while specific statistics are unavailable, there is a significant population incapable of accessing optimal healthcare due to an absence of PGHD.
Prescription Drug Monitoring
Prescription drug monitoring programs (PDMPs) are among the primary methods used by national and state governments to alleviate the abuse of drugs, especially opioids. PDMPs directly monitor the prescribing and dispensing of controlled medications, focusing on data about the involved doctors and pharmacists, the affected drugs, and the patients receiving the medicines (Diebold, 2022). The data is closely tracked and analyzed to flag cases of abuse and misuse. However, significant disparities exist regarding race, gender, and income. For example, state PMDPs prompt pharmacists to be reluctant when prescribing opioids or stop attending to patients with a history of inappropriate opioid treatment (Dickson-Gomez et al., 2021). These practices result in the seclusion and omission of minority communities particularly affected by the opioid epidemic. Moreover, there is a general lack of high-quality data and an improper use of data-driven technologies, especially for communities below the poverty line (Diebold, 2022). These issues can be regarded as data deserts affecting and limiting healthcare availability to disadvantaged communities.
Public Health
Public health data infrastructure is among the most critical resources for effective regional and national evidence-based decision-making. Data collection is particularly vital in the context of ongoing and recent epidemics and crises, such as the COVID-19 pandemic. The National Center for Health Statistics, a Center for Disease Control and Prevention (CDC) branch, collects relevant data through large-scale surveys. However, the organization has been losing funding annually since 2009 (Diebold, 2022). The diminishing budget has resulted in small sample sizes for survey projects, reducing the available information about minority groups like American Indians. Similarly, financial constraints have limited surveys targeting nursing homes and extended care/ assisted living facilities (Diebold, 2022). The monetary issues, coupled with legal and technical complexities, have also resulted in the absence of a comprehensive system for tracking children’s mental health (Diebold, 2022). These cases indicate a significant data desert affecting minorities and children, limiting their coverage in public health decision-making.
Healthcare Research
healthcare research informs the development of medical devices and therapies and the understanding of social, environmental, and biological drivers of health. However, significant data deserts based on geographic locale, sexual orientation, gender identity, disability, ethnicity, and race constrain the quality of research (Winn, Tossas, & Doubeni, 2023). For example, minority and marginalized communities are often underrepresented in research and clinical trials. In their review of recent biomedical projects training clinical algorithms, Kaushal et al. (2020) found that 71% of patient cohorts are from New York, Massachusetts, and California. This discovery indicates extensive data deserts based on geographical locales. Winn et al. (2023) also report that cancer patients from marginalized groups are underrepresented in generic databases, limiting their capacity to benefit from precision medicine.
Impact of Data Deserts on Underserved Populations
The impact of data deserts can be generalized as data and health inequality. Data inequality can be regarded as the lack of opportunities to access digital data and the underlying benefits, including scientific innovation and socioeconomic improvements (Fisher & Streinz, 2021). For example, individuals in data deserts may be unable to leverage the benefits of PGHD and EHRs. Conversely, health inequalities are the observable differences in health outcomes and the distribution of health resources across different population groups. For instance, adults in Massachusetts are significantly more likely to receive up-to-date colorectal cancer screening (76%) than those in Wyoming (58% (American Association for Cancer Research, n.d.). This statistic illustrates the geographic disparities in cancer care, confirming Kaushal et al.’s (2020) finding that nearly two-thirds of patient cohorts in clinical trials are from New York, Massachusetts, and California. Similarly, Hispanic women are 69% more likely to be diagnosed with breast cancer at advanced stages than their non-Hispanic White counterparts (American Association for Cancer Research, n.d.). The severity of these disparities warrants urgent mitigations to address healthcare data deserts.
Potential Solutions
IT provides a reliable avenue to address healthcare data deserts. For example, advanced surveillance models should be employed in medical research to address the underlying data divides. Winn et al. (2023) recommend the Cancer Intervention and Surveillance Modeling Network (CISNET), which applies population simulation modeling to characterize and fill knowledge gaps and suggest areas for additional research. The tool can also aid in buttressing the connection between systemic racism and cancer mortality and incidence. Moreover, medical facilities can build virtual care centers to reach underserved and rural communities. Small, well-staffed satellite clinics should also be included to help patients lacking broadband or devices to access telemedicine visits. Furthermore, state and national governments and individual hospitals should consider offering discounted or free wearable devices for more inclusive remote patient monitoring. Predictive analytics should also be employed to assess the social determinants of specific geographical areas and deliver medicine and medical services to needy communities. While these interventions can achieve significant benefits, additional research is still required to develop more advanced interventions.
Conclusion
Data and AI underpin current advancements in healthcare, helping reduce the incidence, severity, and burden of maladies in society. However, the overreliance on data exacerbates the impact of data deserts on universal access to quality care. Data deserts manifest on various fronts, including EHRs, PGHD, prescription drug monitoring, public health, and health research. The underlying disparities mainly affect minority and marginalized communities, such as Hispanics, Blacks, and American Indians. Regardless of the affected populations, the main effects of data deserts can be generalized as data and health inequalities. Health inequalities are particularly significant, with extensive disparities reported in the incidence and treatment of chronic maladies like cancer. IT solutions, such as CISNET and predictive analytics, can be leveraged to alleviate healthcare data deserts. However, research is still needed to determine feasible interventions for specific data deserts.
References
American Association for Cancer Research (n.d.). Cancer health disparities. https://www.aacr.org/patients-caregivers/about-cancer/cancer-health-disparities/#:~:text=The%20National%20Cancer%20Institute%20defines,diagnosis%20that%20exist%20among%20certain
Dickson-Gomez, J., Christenson, E., Weeks, M., Galletly, C., Wogen, J., Spector, A., ... & Ohlrich, J. (2021). Effects of implementation and enforcement differences in prescription drug monitoring programs in 3 states: Connecticut, Kentucky, and Wisconsin. Substance Abuse: Research and Treatment, 15, 1178221821992349. https://doi.org/10.1177%2F1178221821992349
Diebold, G. (2022). Closing the data divide for a more equitable US digital economy. Center for Data Innovation. https://www2.datainnovation.org/2022-closing-data-divide.pdf
Fisher, A., & Streinz, T. (2021). Confronting data inequality. Colum. J. Transnat'l L., 60, 829.
Kaushal, A., Altman, R., & Langlotz, C. (2020). Geographic distribution of US cohorts used to train deep learning algorithms. Jama, 324(12), 1212-1213. https://10.1001/jama.2020.12067
Mack, D., Zhang, S., Douglas, M., Sow, C., Strothers, H., & Rust, G. (2016). Disparities in primary care EHR adoption rates. Journal of Health Care for the Poor and Underserved, 27(1), 327-338. https://doi.org/10.1353%2Fhpu.2016.0016
Rauch, S. (2024, may 15). Case studies: The growing role of AI and Big Data in healthcare. Simplilearn. https://www.simplilearn.com/role-of-ai-and-big-data-in-healthcare-article
Winn, R. A., Tossas, K. Y., & Doubeni, C. (2023). Commentary: Some water in the data desert: the Cancer Intervention and Surveillance Modeling Network’s capacity to guide mitigation of cancer health disparities. JNCI Monographs, 2023(62), 167-172. https://doi.org/10.1093/jncimonographs/lgad032