5  Conclusion

5.1 Main Takeaways:

  • Pedestrians are the most vulnerable in collisions, while drivers usually walk away relatively safer.

  • It’s good to be extra careful around Sedans, Station Wagons, Sport Utility Vehicles and Taxis since they are biggest culprits as seen in our data..

  • People can be educated more on ways to be more attentive while driving, and about the right-of-way and why it’s important to reduce the collisions and their consequences.

  • The Mystery of Staten Island: While Staten Island had much much lower cases of collisions overall, the number of collisions did not change at all when there was a COVID-19 Mandated lockdwon. Was there no lockdown in Staten Island!? There probably was, but there had be some kind of change in the pattern of number of cases, and oddly there wasn’t. Unsolved Mystery.

  • Awareness or dangerous times and areas can help make drivers more alert and cautious while driving.

5.2 Limitations:

  • The dataset is very big in size (414 MB and 800MB), and github has a limit of 50MB for files to be pushed. So the interactivity part was compromised because of that.

  • The collected data had a lot of NA values and there was just a lot of data that it, in a way, hindered some analysis ideas. Like most frequently involved vehicles has almost 1600 unique values of vehicles; so to show the graphs with their relation to other variables would not be practically feasible.

  • Some data was highly skewed so some experimental graphs didn’t produce any visible insights. For instance, the number of people injured is exponentially more than the number of people killed; so much so that they were not even comparable.

5.3 Future Directions:

  • This can very easily be extended to more cities/states and more accurate and inclusive insights can be gather from the data which in-turn might help to come up with effective solutions.

  • From this exploratory analysis, we could employ modelling techniques to see which areas are more prone to collisions, and make those areas safer (eg adding more speed bumps, adding stop signs and more traffic lights).

5.4 Lessons Learned:

  • Learned to work with big dataset (5 million observations) with skewness and a lot of missing values.

  • The collected data was very dirty, and had to be cleaned in many ways. So, learned to do that.