A Co-Evolutionary Metaheuristic-HDBSCAN Framework for Feature Selection in eGFR-Based CKD Prediction
Abstract
In medical time-series applications, feature selection is critical because datasets often have high-dimensional, complex feature spaces. As a case study, this work focuses on predicting chronic kidney disease using a dataset with more than 70 features and more than 500 patient records collected over four years, making feature selection a challenging multi-objective optimization problem. To account for temporal variation in these features, a co-evolutionary approach is used to simultaneously optimize feature selection and clustering parameters: a Genetic Algorithm selects optimal feature subsets, while the Firefly Algorithm determines the minimum cluster size for the HDBSCAN clustering algorithm. The proposed model selects 30 to 40 highly relevant features and achieves 90% to 98% accuracy in predictive models trained using a random regressor and an XGBoost regressor to predict eGFR value of chronic kidney disease prediction. This dual-optimization strategy aims to reduce dimensionality and improve machine learning performance, producing more interpretable and effective prediction models for clinical use.