Disparities and flat tires

The job of a data scientist like that of dish soap in a tub.  By looking at the tire in an instant you can tell that it's properly inflated-- or by looking at it over time, you can tell that it has a slow leak, but you don't know whether that leak can be plugged or where it is on the inner tube until until you submerge it in soapy water and rotate.  (For those who have never fixed an inner tube, you're welcome).

Having data for Seattle or King County as a whole can tell you that there is a disparity.  Data for Seattle over time is even better, because it can tell you something about the rate of change of the disparity over time, but in order to be able to do something about the disparity it helps to have data for areas smaller than the city level and for different groups in those areas, so you can diagnose the problem where the disparity is occurring most acutely.

Below are two views of a Google Map that contains color coded tracts using values corresponding to extracted 2010 census data and 2014 crime data for Seattle.

By clicking on the legend and viewing the data table and legend you can explore the numbers behind the color gradations for each census tract.  At a glance you may be able to see a correlation between the colors in the map of census data and the colors in the crime data, but since correlation is not causation, more data is needed to find out a relationship among the crime and census data.

The image below is a correlation matrix created using the great "corrplot" package in R.  It provides a simple visual of the correlation among the variables in the data table of the maps above.  All of the correlations in the matrix are significant to a 95% confidence level, but you can see that the correlation is strongest and most positive between the two types of assaults. 

map-correlation.png

This is a very simple example, but hopefully it begins to show how adding views and gathering additional data series can point towards disparities, and how with diligence it might be possible to find where and for whom those disparities are most evident.