ROOFS: RObust biOmarker Feature Selection
About
Feature selection (FS) is essential for biomarker discovery and clinical predictive modeling. Over the past decades, methodological literature on FS has become rich and mature, offering a wide spectrum of algorithmic approaches. However, much of this methodological progress has not fully translated into applied biomedical research. Moreover, challenges inherent in biomedical data, such as high-dimensional feature space, low sample size, multicollinearity, and missing values, make FS non-trivial. To help bridge this gap between methodological development and practical application, we propose ROOFS (RObust biOmarker Feature Selection), a Python package available at https://gitlab.inria.fr/compo/roofs, designed to help researchers in the choice of FS method adapted to their problem. ROOFS benchmarks multiple FS methods on the user's data and generates reports summarizing a comprehensive set of evaluation metrics, including downstream predictive performance estimated using optimism correction, stability, robustness of individual features, and true positive and false positive rates assessed on semi-synthetic data with a simulated outcome. We demonstrate the utility of ROOFS on data from the PIONeeR clinical trial, aimed at identifying predictors of resistance to anti-PD-(L)1 immunotherapy in lung cancer. Of the 34 FS methods gathered in ROOFS, we evaluated 23 in combination with 11 classifiers (253 models) and identified a filter based on the union of Benjamini-Hochberg false discovery rate-adjusted p-values from t-test and logistic regression as the optimal approach, outperforming other methods including widely used LASSO. We conclude that comprehensive benchmarking with ROOFS has the potential to improve the reproducibility of FS discoveries and increase the translational value of clinical models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Feature Selection | PIONeeR post-VIF | -- | 26 |