Machine Learning Disparities in People of Color – Analyzing The Impact In Healthcare

Authors: Amina Khalpey, PhD, Kirtana Roopan, Zain Khalpey, MD, PhD, FACS

Machine learning tools in healthcare are rapidly gaining popularity as a means of improving accuracy and efficiency in diagnosing and predicting diseases, including cardiovascular disease (CVD). However, questions have arisen as to the reliability of these tools in people of color, who are often underrepresented in healthcare data and studies. This blog post will explore the current state of research on machine learning in healthcare, the potential benefits and drawbacks of these tools, and the potential impact on communities of color.

Background:

Cardiovascular disease is a leading cause of death worldwide, responsible for over 17 million deaths each year. The onset of CVD can be difficult to predict, and early identification and intervention are crucial to reducing the risk of serious complications. To address this issue, healthcare providers have begun to turn to machine learning tools as a means of identifying individuals at high risk of CVD.

Machine learning algorithms use data and statistical methods to learn from experience and make predictions based on that experience. In healthcare, machine learning tools have been used to analyze large amounts of data, such as electronic health records (EHRs), to predict disease outcomes and identify individuals at high risk for CVD. These tools can also be used to analyze imaging data, such as CT scans and MRIs, to diagnose and monitor diseases such as heart failure and stroke, and to predict disease outcomes and complications.

Benefits and Drawbacks of Machine Learning Tools in Healthcare:

There are several potential benefits to using machine learning tools in healthcare, including improved accuracy and efficiency in diagnosing and predicting diseases. Machine learning algorithms can be trained on large amounts of data and can analyze that data quickly and accurately, making it possible to identify patterns and correlations that may be missed by traditional methods. Additionally, machine learning tools can be used to identify individuals at high risk of CVD, allowing healthcare providers to target interventions and preventative measures to those most in need.

However, there are also potential drawbacks to using machine learning tools in healthcare, particularly when it comes to people of color. One major issue is the lack of representation of communities of color in healthcare data and studies. This underrepresentation can result in biased algorithms that do not accurately reflect the health needs and outcomes of these populations.

Bias in data and AI application can arise in several ways. Embedded data bias includes incomplete health data, lack of diversity in study cohorts, factoring individuals or groups with access and robust health data profiles, and bias in modeling structure, sample selection, and prediction metric selection. There is also a human aspect ingrained in data generation and application since people are involved in the ways data are generated, collected, supported, and translated into medical practice, and each of these presents an avenue through which bias can infiltrate. Algorithms can also be biased due to lack of data on the social determinants of health, which can greatly help put patient health information into etiologic context. Additionally, bias among providers in the form of stereotyping, unconscious biases, and other discriminatory practices can affect the electronic health records used to train AI (1). These potential sources of bias can significantly affect the performance of AI, leading to incorrect predictions or diagnoses.

Impact on Communities of Color:

Communities of color are at increased risk for a number of health conditions, including CVD. However, these populations are often underrepresented in healthcare data and studies, leading to a lack of understanding of the specific health needs and outcomes of these populations. This can result in biased algorithms that do not accurately reflect the health needs and outcomes of these populations, leading to incorrect diagnoses and unnecessary or inadequate treatment plans.

Additionally, communities of color are often subject to systemic racism and discrimination within the healthcare system in the form of unequal quality and access to the care they receive, which can lead to mistrust and decreased access to healthcare (2). The use of machine learning tools in healthcare may exacerbate these disparities, as algorithms may be trained on data that reflects these biases and discrimination, leading to incorrect predictions and diagnoses for individuals in these communities.

Conclusion:

Machine learning tools have enormous potential for improving the accuracy and efficiency of healthcare, particularly in diagnosing and predicting cardiovascular risk. However, their potential impact on communities of color must be considered, as these populations are often underrepresented in healthcare data and studies, which can result in biased algorithms that do not accurately reflect their health needs and outcomes. The use of these tools may also exacerbate existing disparities within the healthcare system, leading to incorrect diagnoses and treatment plans.

To mitigate these issues, it is important to prioritize diversity and representation in healthcare data and studies. There are several barriers to achieving this representation including patient mistrust of clinical trials, health insurance coverage and restrictions, the kinds of institutions at which certain patients receive care, and other social determinants of health. In addition to overcoming these barriers to diversifying patient participation in studies, it is also imperative that studies are diligent and explicit in reporting race, as lack of reporting can hinder efforts to increase diversity (3). Emphasis on representation and diversity in research will help in the development of machine learning algorithms trained on diverse, comprehensive data sets that accurately reflect the full range of variability within populations. This will help to ensure that machine learning tools are reliable and accurate in diagnosing and predicting CVD risk in all communities, including people of color. Development of risk of bias evaluations for prediction models is also crucial to ensure accurate and reliable application of machine learning in medicine (4).

Additionally, it is crucial to continue to address systemic racism and discrimination within the healthcare system, as these factors can contribute to decreased access to care and mistrust among communities of color. Addressing these issues is necessary to create a healthcare system that is equitable, trustworthy, and effective for all individuals.

In conclusion, while machine learning tools in healthcare offer tremendous potential for improving the accuracy and efficiency of diagnosing and predicting CVD risk, it is essential to consider their impact on communities of color and to prioritize diversity, representation, and equity in the development and implementation of these tools. This will help to ensure that machine learning tools in healthcare work reliably in all populations, and that everyone has access to the care they need to prevent and treat CVD.

References:

1. Dankwa-Mullan I, Weeraratne D. Artificial Intelligence and Machine Learning Technologies in Cancer Care: Addressing Disparities, Bias, and Data Diversity. Cancer Discov. 2022 Jun 2;12(6):1423-1427.

2. Hamed S, Bradby H, Ahlberg BM, Thapar-Björkert S. Racism in healthcare: a scoping review. BMC Public Health. 2022 May 16;22(1):988.

3. Camidge DR, Park H, Smoyer KE, Jacobs I, Lee LJ, Askerova Z, McGinnis J, Zakharia Y. Race and ethnicity representation in clinical trials: findings from a literature review of Phase I oncology trials. Future Oncol. 2021 Aug;17(24):3271-3280.

4. Andaur Navarro CL, Damen JAA, Takada T, Nijman SWJ, Dhiman P, Ma J, Collins GS, Bajpai R, Riley RD, Moons KGM, Hooft L. Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review. BMJ. 2021 Oct 20;375:n2281.

5. Gichoya JW, Banerjee I, Bhimireddy AR, Burns JL, Celi LA, Chen LC, Correa R, Dullerud N, Ghassemi M, Huang SC, Kuo PC, Lungren MP, Palmer LJ, Price BJ, Purkayastha S, Pyrros AT, Oakden-Rayner L, Okechukwu C, Seyyed-Kalantari L, Trivedi H, Wang R, Zaiman Z, Zhang H. AI recognition of patient race in medical imaging: a modelling study. Lancet Digit Health. 2022 Jun;4(6):e406-e414.

6. Baxter JSH, Jannin P. Bias in machine learning for computer-assisted surgery and medical image processing. Comput Assist Surg (Abingdon). 2022 Dec;27(1):1-3.

7. Koçak B, Cuocolo R, dos Santos DP, Stanzione A, Ugga L. Must-have Qualities of Clinical Research on Artificial Intelligence and Machine Learning. Balkan Med J. 2023 Jan 23;40(1):3-12.

8. O’Reilly-Shah VN, Gentry KR, Walters AM, Zivot J, Anderson CT, Tighe PJ. Bias and ethical considerations in machine learning and the automation of perioperative risk assessment. Br J Anaesth. 2020 Dec;125(6):843-846.