I would like to share with you my insights that I gained in course of my capstone project, working with car accident data from Seattle, USA.
The purpose of this exploratory data analysis exercise is to assess the possibility and accuracy to predict car accident severity in Seattle, USA by means of supervised machine learning models, exploiting collision track records from past accidents that were recorded by the Seattle Police Department (SPD) and provided as open data by Traffic Records.
Being able to predict car accident severity from extrenal factors like weather, location, road conditions as well as speeding, influence of alcohol/drugs etc. will allow the government to put appropriate meassures in place to reduce accident severity, but above all, allow the police and first response teams to channel their resources and increase efficiency.
Using car accident track records from March 2013, three different machine-learning methods, namely K-Nearest Neighbours (KNN), Decision Trees, and Logistic Regressors, were benchmarked against each other
While the exploratory data analysis suggests, that almost 90% of all accidents, involving pedestrians and 88% of collisions involving cyclists lead to injuries (compare: 28% of accidents without pedestrians/cyclists lead to injuries), the tested machine learning models generally had problems to correctly predict `SEVERITYCLASS=2` and therefore exhibited a high number of *false negatives* for this class which can in real life lead to a wrong allocation of resources of police and first responders and potentially end deadly.
The link to my Jupyter Notebook on Github can be found here:
Hope you find it interesting and helpful for your submission!