Sample Selection Bias in Machine Learning for Healthcare
arxiv(2024)
摘要
While machine learning algorithms hold promise for personalised medicine,
their clinical adoption remains limited. One critical factor contributing to
this restraint is sample selection bias (SSB) which refers to the study
population being less representative of the target population, leading to
biased and potentially harmful decisions. Despite being well-known in the
literature, SSB remains scarcely studied in machine learning for healthcare.
Moreover, the existing techniques try to correct the bias by balancing
distributions between the study and the target populations, which may result in
a loss of predictive performance. To address these problems, our study
illustrates the potential risks associated with SSB by examining SSB's impact
on the performance of machine learning algorithms. Most importantly, we propose
a new research direction for addressing SSB, based on the target population
identification rather than the bias correction. Specifically, we propose two
independent networks (T-Net) and a multitasking network (MT-Net) for addressing
SSB, where one network/task identifies the target subpopulation which is
representative of the study population and the second makes predictions for the
identified subpopulation. Our empirical results with synthetic and
semi-synthetic datasets highlight that SSB can lead to a large drop in the
performance of an algorithm for the target population as compared with the
study population, as well as a substantial difference in the performance for
the target subpopulations that are representative of the selected and the
non-selected patients from the study population. Furthermore, our proposed
techniques demonstrate robustness across various settings, including different
dataset sizes, event rates, and selection rates, outperforming the existing
bias correction techniques.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要