A Hybrid Feature Selection and Fuzzy C-Means Clustering Framework for Enhanced Heart Disease Prediction Using Machine Learning
DOI:
https://doi.org/10.65138/ijmdes.2026.v5i5.308Abstract
Cardiovascular disease (CVD) is a leading cause of mortality worldwide and detecting it early can make all the difference between a manageable condition and a medical emergency. In this study set out to build a smarter prediction system by combining several feature selection techniques Chi-Square, Recursive Feature Elimination, Lasso regression, and Random Forest importance to narrow down the most clinically meaningful predictors from the UCI Heart Disease dataset (303 patients, 13 attributes). We then applied Fuzzy C-Means clustering to uncover hidden patient subgroups, validating the choice of three clusters through four independent checks (Elbow method, Silhouette score, Dunn index, and Gap statistic), all of which agreed. This cluster information was added as an extra feature before training five classifiers: Logistic Regression, SVM, Decision Tree, Random Forest, and Naive Bayes. The Support Vector Machine came out on top with 90.2% accuracy, followed closely by Logistic Regression and Random Forest. Chest pain type, ST depression, heart rate, and the cluster label itself proved to be the strongest predictors. Overall, our results suggest that thoughtful preprocessing not just fancier algorithms can meaningfully improve how well we predict cardiovascular risk.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Kajol Bala

This work is licensed under a Creative Commons Attribution 4.0 International License.