top of page

Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

HEALTHCARE: Predicting Strokes with Machine Learning

Project type

Data Analysis and Machine Learning

Date

May 2025

Tools Used

Python

Skills Used

Data Analysis, Machine Learning, Statistics

- Summary

The purpose of this project is to use available health data to try to predict who is at risk of suffering a stroke. While we couldn’t find a reliable way to predict strokes, we did manage to find a series of variables that we can use to put people into different groups that indicate their chances of suffering a stroke.

- Warning

I’m not a doctor and no doctor has reviewed the findings of this analysis, so please do not make any decisions about your health based on this project.

- Challenges

A big problem with this data is that the number of people who suffered a stroke in the data is very small, this means that simply guessing “No Stroke” for every person gives you a prediction score of around 95% which makes using machine learning techniques and testing them quite tricky.

- Data Preparation

Python’s scikit library machine learning algorithms only work with numeric fields. To solve this problem, I used python to create calculated fields transforming strings into binary fields with 1 or 0.

- Data Analysis

Using a Decision Tree Classification algorithm, we can classify people into the following groups and their chances of suffering a stroke.

bottom of page