A Hybrid Feature Selection and Fuzzy C-Means Clustering Framework for Enhanced Heart Disease Prediction Using Machine Learning

Authors

  • Kajol Bala Department of Medical Physics and Biomedical Engineering, Gono University, Dhaka, Bangladesh

DOI:

https://doi.org/10.65138/ijmdes.2026.v5i5.308

Abstract

Cardiovascular disease (CVD) is a leading cause of mortality worldwide and detecting it early can make all the difference between a manageable condition and a medical emergency. In this study set out to build a smarter prediction system by combining several feature selection techniques Chi-Square, Recursive Feature Elimination, Lasso regression, and Random Forest importance to narrow down the most clinically meaningful predictors from the UCI Heart Disease dataset (303 patients, 13 attributes). We then applied Fuzzy C-Means clustering to uncover hidden patient subgroups, validating the choice of three clusters through four independent checks (Elbow method, Silhouette score, Dunn index, and Gap statistic), all of which agreed. This cluster information was added as an extra feature before training five classifiers: Logistic Regression, SVM, Decision Tree, Random Forest, and Naive Bayes. The Support Vector Machine came out on top with 90.2% accuracy, followed closely by Logistic Regression and Random Forest. Chest pain type, ST depression, heart rate, and the cluster label itself proved to be the strongest predictors. Overall, our results suggest that thoughtful preprocessing not just fancier algorithms can meaningfully improve how well we predict cardiovascular risk.

Downloads

Download data is not yet available.

Downloads

Published

31-05-2026

Issue

Section

Articles

How to Cite

[1]
K. Bala, “A Hybrid Feature Selection and Fuzzy C-Means Clustering Framework for Enhanced Heart Disease Prediction Using Machine Learning”, IJMDES, vol. 5, no. 5, pp. 21–26, May 2026, doi: 10.65138/ijmdes.2026.v5i5.308.