Over the past several decades, health care systems have amassed huge stores of patient data through electronic health records, scoring, disease-related genetic aberrations, drug interactions, success rates for cancer treatments, and more.
Now, clinicians and researchers with access to this body of health information are at an inflection point. Through artificial intelligence, health systems have the ability to take advantage of amorphous data aggregates that can predict patient health outcomes.
It’s time to put this technology to work at scale — in ways that prioritize informed protocols designed to prevent bias in data collection and use in patient care, Nigam ShahMBBS, Ph.D., A Stanford Medicine Professor of Biomedical Data Science and Medicine.
“Overall, automated disease detection or assessment of a group of symptoms can save healthcare workers time and hospital money, while maintaining an excellent standard of care,” he said. Shahwho was recently set Stanford HealthcareChief Data Scientist inaugural. “The data is there, the incentive to use it is there, and that creates this sense of urgency to deploy AI. And that passion means we have to implement the guidelines to ensure we do it in the right way, with fairness and equity first.”
Algorithms are rarely used regularly in patient care, but this may soon change as machine learning drops balloon and flood the healthcare system. So, now is the time to get these regulations.
Using AI in the clinic means harnessing automated pattern recognition in healthcare data to inform diagnoses and predict medical outcomes. Are cancer patients with a certain set of mutations better off taking drug A or drug B? Do some characteristics of an MRI indicate symptoms of the condition? Does the lesion on the skin look cancerous?
Among these questions, a larger question looms: Are the outputs of clinical algorithms fair and accurate for all? Shah and others put this question under a microscope, refining a set of standard principles that could guide any algorithm to use in patient care – something that has yet to be done.
It’s not for lack of trying — AI researchers feel an enormous sense of responsibility when devising and implementing algorithms that affect human health, Shah said. The obstacle is in the “how”.
I spoke with Shah about this challenge and the discourse about standards that prioritize fairness and equity for clinical care algorithms, and about solutions that he and others are proposing. The following questions and answers are based on our conversation.
What does it mean for an algorithm to be fair and how can this be measured?
Through the Alliance for Artificial Intelligence for Health, I work with a group of researchers from several institutions, including the Mayo Clinic, Duke University and Johns Hopkins to develop and streamline AI justice guidelines.
Just as in regular clinical care, algorithm-based care should lead to fair treatment of all patients. This means making sure that the algorithms are not biased towards certain demographics and that the data we use to train the algorithms is comprehensive enough. For example, does the algorithm perform the same for men and women, black or non-black patients? Ideally, we want a model that is calibrated for all patient subgroups as well. This means that the models work just as reliably in a primary care setting as they do in an oncology clinic.
To help assess this, we’re doing something called a “fairness audit”. A financial audit basically asks if your credits and debts are up to zero. Instead, an AI justice audit asks whether the way the algorithm works is balanced across patient demographics, while performing equally well for anyone and all people who might benefit from it.
If the algorithm does not make a systematic difference in the way care is allocated – eg, recommending a prescription for a statin – then it can be called ‘fair’.
What are the challenges in developing new guidelines for the broad implementation of AI in healthcare and what are your conclusions?
We’re starting to look at the current guidelines from the research community – basically the accepted research bequests of AI. What should you do to achieve responsible AI in healthcare? and we Analyze it 16 or so publications that collectively suggested 220 things you should do. And let’s just be honest – no one is going to do 220 things at a time. It is not feasible. There was little focus on fairness and almost nothing on how to assess benefit.
So instead, we asked, how many of these 220 things would it be reasonable to ask researchers to stick to? What elements do scientists often include when reporting data in papers? Which one do we agree is more important? In the end, there were about a dozen recommendations that were the most common, and fortunately, they are commonly reported in manuscripts as well.
Overall, this analysis reported the design of something we now call a FURM assessment, in which we seek to assess whether we are providing targeted care that is equitable, beneficial, and reliable. To assess interest, we created a file Simulation based framework This also factors in the working capacity limits.
What types of algorithms are used at Stanford University of Healthcare and how do you make sure they are fair?
We have two algorithms – both operating under Immigration and Refugee Board approval – that help guide clinical care. The first is one that my team created, a mortality prediction tool to help clinicians predict who might be at risk of dying in the next year and thus might benefit from having goals for care conversations.
The second is an algorithm created by a radiation oncologist and Professor Michael GensheimerMD, which predicts survival for patients with metastatic cancer based on specific treatment or drug regimens. It basically helps the doctor choose the course of treatment that is most likely to help the patient survive longer.
For both of these and any future algorithms, we start them in “silent mode,” which allows us to examine how the algorithm performs without affecting patient care. We basically run it in the background and make sure it produces results – whether it’s a treatment recommendation or a prediction of length of hospital stay – that are equally valid across different subgroups. Once we convince ourselves that the algorithm’s guidelines are reliable enough, we can deploy it in the clinic and continue to monitor its usefulness.
We considered both algorithms in use at the Stanford Healthcare Fair during their “silent” phase, and recently completed a fairness audit using our new guidelines for responsible implementation of AI. Preliminary results From these new audits confirm our confidence.
Is a biased algorithm harmful? Can it be repaired?
First, you always want to make sure your data is comprehensive and representative. But when a case of bias appears, we look for the source of the bias. I ask a two-part question: Is there a difference in the digital output produced by the algorithm? And how big is the impact of these numbers on the consequences in the clinic?
This last question is important, because sometimes the algorithm can produce a number that may already be technically biased – but it does not change the recommendation regarding patient care.
For example, a person’s risk score for a heart attack over the next 10 years is based on several different data points, all of which were collected and analyzed in a study called the Framingham Heart Study, which was mostly white men and then updated to include three additional cohorts. However, it is a favour That heart attack risk scores are not well calculated for Asians, blacks, and women, compared to calibrated for white men.
But it does not create an equity problem because the bias in the degree of risk is quite small. To put it in context, if a score of 7 means that a patient must be prescribed stats to reduce their risk of heart disease, and an Asian person has a score of 7.3 with the uncalibrated algorithm, and a score of 7.5 using a modified algorithm, their clinical outcome will remain the same: they receive statins.
When bias in numbers changes treatment or care protocol between subgroups, algorithms need to be rethought or retrained with more data. This is a problem that can be solved.
The following questions that are starting to emerge are: Given the adoption of algorithms in care, how will it change the doctor-patient relationship? Will it splinter the care or make it firmer? And do health-care systems have the potential to deliver the benefit that appears to exist on paper?
These are the big questions that the Stanford Health Care Data Science team is addressing now and in the future.
The Alliance for Health Artificial Intelligence is funded in part by the Betty and Gordon Moore Foundation.
The Stanford Healthcare Artificial Intelligence Program was started with a gift from Mark and Debra Leslie.
photo from photography Chor Muang