Create Your First Project
Start adding your projects to your portfolio. Click on "Manage Projects" to get started
HEALTHCARE: Predicting Strokes with Machine Learning
Project type
Data Analysis and Machine Learning
Date
May 2025
Tools Used
Python
Skills Used
Data Analysis, Machine Learning, Statistics
Python Code
- Summary
The purpose of this project is to use available health data to try to predict who is at risk of suffering a stroke. While we couldn’t find a reliable way to predict strokes, we did manage to find a series of variables that we can use to put people into different groups that indicate their chances of suffering a stroke.
- Warning
I’m not a doctor and no doctor has reviewed the findings of this analysis, so please do not make any decisions about your health based on this project.
- Challenges
A big problem with this data is that the number of people who suffered a stroke in the data is very small, this means that simply guessing “No Stroke” for every person gives you a prediction score of around 95% which makes using machine learning techniques and testing them quite tricky.
- Data Preparation
Python’s scikit library machine learning algorithms only work with numeric fields. To solve this problem, I used python to create calculated fields transforming strings into binary fields with 1 or 0.
- Data Analysis
Using a Decision Tree Classification algorithm, we can classify people into the following groups and their chances of suffering a stroke.






