Disparities and flat tires

The job of a data scientist like that of dish soap in a tub.  By looking at the tire in an instant you can tell that it's properly inflated-- or by looking at it over time, you can tell that it has a slow leak, but you don't know whether that leak can be plugged or where it is on the inner tube until until you submerge it in soapy water and rotate.  (For those who have never fixed an inner tube, you're welcome).

Having data for Seattle or King County as a whole can tell you that there is a disparity.  Data for Seattle over time is even better, because it can tell you something about the rate of change of the disparity over time, but in order to be able to do something about the disparity it helps to have data for areas smaller than the city level and for different groups in those areas, so you can diagnose the problem where the disparity is occurring most acutely.

Below are two views of a Google Map that contains color coded tracts using values corresponding to extracted 2010 census data and 2014 crime data for Seattle.

By clicking on the legend and viewing the data table and legend you can explore the numbers behind the color gradations for each census tract.  At a glance you may be able to see a correlation between the colors in the map of census data and the colors in the crime data, but since correlation is not causation, more data is needed to find out a relationship among the crime and census data.

The image below is a correlation matrix created using the great "corrplot" package in R.  It provides a simple visual of the correlation among the variables in the data table of the maps above.  All of the correlations in the matrix are significant to a 95% confidence level, but you can see that the correlation is strongest and most positive between the two types of assaults. 

map-correlation.png

This is a very simple example, but hopefully it begins to show how adding views and gathering additional data series can point towards disparities, and how with diligence it might be possible to find where and for whom those disparities are most evident.

Good example of disparity graph in the news

I didn't watch the Oscars, but I did search online to see Chris Rock's monologue.  In the coverage of the event I found this cool, interactive but simple graph that elegantly shows the disproportionality in Oscar nominee race by comparing it with other relevant race proportions.  I am a big fan of "disproportion looking like cigarette filter" bar graphs because it makes me consciously associate how unfair something is with how much tar someone would have to inhale if they smoked a cigarette with a filter of that size.  In my case it probably comes from too many viewings of The Fifth Element and hearing Bruce Willis' complaint about cigarettes in the future, but you can choose a graph style that speaks to you for another inane reason.

Mapping using census data

In my last post from the distant past I explained the differences between the different types of census data and briefly discussed some of the limitations of the data-- that they underestimate certain populations that are often the focus of issues of equity: small population groups, those with no or unstable housing, and the incarcerated. But census data can still be a powerful tool for visualizing disparities.

Below I have created a map in Google MyMaps, a powerful tool that enables people who don't have Jedi-level GIS skills to create interactive maps that have editable data.  The map shows the census tracts for Seattle (a census tract is the base geographic unit of measurement for the decennial census).  

The tracts are numbered in a way that may repeat within a state, but not within a city.  I created the map using a .kml version of the census tract boundary file.  I created mine by downloading the .shp file version of the Seattle census from Seattle's Open Data site, http://data.seattle.gov/ and converting it to a .kml file in ArcGIS, but fortunately, the Seattle Open Data Site now also contains a .KML version of the file.  There are also a number of good, free utilities to accomplish the conversion if you don't have access to ArcGIS or .KML files.

I also created a blank map of the Seattle Public-Use Microdata Areas (PUMAs), the base geographic unit of the American Community Survey (ACS) PUMS shown below.

Why do the decennial census and the ACS have different geographic units?  No, it's not them being ornery; the reason is that the ACS isn't a true count-- rather it samples a subset of the population (I will in a later post discuss what this does to ACS estimates), and the PUMAs in every area of the country differ in order to encompass a sizable chunk of the population each year people are sampled.

Using the PUMA map and the census tract maps, we can begin to link geographic data with other administrative data that is available for the same geographic extents.  By combining those data with information about groups that interest you, you will be able to get a better sense of where disparities exist.