Identifying and mitigating Machine Learning-Induced over-diagnosis in Acute Kidney Failure

Ashleigh Armstrong, Daniel Capurro, Douglas Pires

Background: Machine Learning (ML) has been employed throughout healthcare to predict/forecast diseases. Diagnosing at-risk, non-symptomatic patients can result in overdiagnosis, a correct diagnosis of an illness that may not improve patient outcome. Overdiagnosis should be mitigated as it can result in unnecessary procedures and financial strain. Acute Kidney Failure (AKF) involves an abrupt change in kidney function and is damaging in ill patients. Overtreatments for AKF would lead to invasive/expensive/unnecessary treatment.

Aims: This project aims to develop methods to identify and characterise overdiagnoses in ML diagnosed AKF patients and provide overdiagnosis-mitigating guidelines for algorithm development.

Methods: Patient data was sourced from MIMIC-IV. Demographics and chart event data across Intensive Care Unit stays were selected and optimal features filtered using statistical analysis. Predictive models were trained to predict whether patients would present with AKF within 48hrs after admission. Verified true cases will be analysed using process mining to determine if overdiagnosis occurred.

Results: A balanced data set of 27,854 patients was compiled and used as evidence to train different learning algorithms, achieving an Area Under the ROC Curve of up to 0.78, using Logistic Regression, comparable to previous efforts in the literature.

Conclusions: Preliminary results show promising predictive performance, which will be refined further. Data mining and clustering will be employed on true positive AKF cases to determine if overdiagnosis occurred. This will serve as a pilot study to generate guidelines for algorithm development and safe deployment of ML-guided diagnosis.