Are we ready for machine radiologists, and their mistakes?

by

Anton Dolgikh, head of AI at DataArt considers whether we are ready for artificial radiologists, and their mistakes, as solutions are sought to lighten the load on the workforce. 

“We are at a crossroads,” these are the words that have become synonymous with the healthcare revolution over the last few years. During this time, advancements in Machine Learning-based image processing have reached impressive heights. A quick glance at the latest news and you will be bombarded with headlines like ‘AI generates faces of non-existing people.’ Yet, the news from the medical ground is less positive and laden with phrases that start with “understanding and confronting our mistakes…” There are a great many articles concerning medical image processing which state the number of scans per patient has grown dramatically over the last couple of years, and so too has the burden on radiologists. What we need is an unequivocal and sleepless assistant to come and save the situation. But are we ready for artificial radiologists? And better yet, are we ready for their mistakes?

There’s no denying that the workload for radiologists is continuously on the rise. The more substantial the workload, the more likely errors are to occur – can one possibly fathom processing hundreds of images per shift? Becoming a radiologist is not for the faint hearted. It takes 14 years of post-high school education to train an expert radiologist, and a long 20-22 years of nurturing a narrowly-focused specialist. Simply training more radiologists is not the solution to reducing radiologists’ errors.

In this regard, the adoption of an automatic, AI-carrying, decision-making system looks like a strong alternative. Such systems have the obvious advantages of not being influenced by the time of day or by the number of patients, nor do they need breaks as human radiologists do. Moreover, they are continually learning from new cases just as radiologists do. However, problems lurk under this shiny surface of benefits. How should one test such a system before using it in clinical conditions? One can argue that we have substantial experience in designing decision-making systems for nuclear power plants or airplanes – quite sophisticated systems too. But the main difference between an airplane and a human is that we still know very little about the mechanisms governing life processes, while an airplane is driven by the physical laws which we understand rather well.

In medicine, there is no black box that magically solves problems, because one needs to understand both the solution itself and how it is obtained. This need has given rise to the term "explainable AI" with regard to AI in healthcare. It is difficult to perceive what is going on inside the workhorses of image processing, Neural Networks (NNs). It’s merely possible to predict their behaviour. Can we say how many images are needed to train a specific NN with a predefined accuracy level? No. We can answer this question only empirically. Can we say how a data set quality influences the predictive power of NNs? Yes, we know that a data set must be labelled with extreme precision, and this is why it is necessary for medical data sets with cancer images to be accompanied with the biopsy results. But if the opinions of two radiologists coincide only in 60% of cases, how does one provide a reliable process of labelling the images to train AI?

According to the Kim-Mansfield scheme, radiologists’ errors can be divided into 12 types. Most of them are the cognitive errors that are immanent to human nature. What about the white spots in knowledge? AI is only as good at the data. It can also be ignorant about rare cases that were not present in the training data sets.

Utilisation of decision-making systems leads to cognitive bias which raises two types of errors: the error of commission and the error of omission. It’s possible for a clinician to act according to a system’s advice, neglecting their own knowledge and experience. On the other hand, an error can occur when clinician neglects the prompt from the machine. Would a patient calmly accept a radiologist’s mistake if they knew that the mistake was made by AI? Would it make malpractice claims less probable? Radiologists can be tired and emotionally-biased whereas any layman would suppose that AI is devoid of such cognitive weaknesses. However, don’t the super-abilities of AI make a patient much less tolerant to AI’s errors? It’s essential to form a more realistic understanding of “radiologic” AI and to improve patients’ perception of its faults. If there is to be a future for artificial radiologists’, developing clear and comprehensive educational material is part of the solution to tolerating AI’s errors in radiology.

For AI to work on par with radiologists, the industry needs a visionary – a company or a person who would be able to demonstrate how to seamlessly integrate AI into the radiology processes, how to avoid the traps of cognitive biases, and how to pave the way for the advances in image processing into clinical practice.

Above all, it is indispensable to remember that processing medical images is neither Data Science, nor Machine Learning: it is medicine.

Back to topbutton