Another Meaning for “Use the Damn Data”

by John W Rodat on April 7, 2015

Ben Wellington, who runs the excellent I Quant New York  did a nice job here unpacking NYC taxi charge data and found that two different systems installed in taxis calculate driver tips differently. 

And he estimates the more generous (?) system produces $5.2 million in tips above what the other system provides. Of course, riders have been unaware and Wellington’s analysis suggests that the Taxi Commission and drivers were probably unaware as well. This is why we should “use the damn data.” And it’s why public data should be open to the public. Even the best intentioned and most capable public officials do not have the time or resources to explore it all. Making data public enables people like Wellington to do their own explorations. 

So, even when it’s in a good visualization, it’s not just a matter of looking at the data. It’s also a matter of thinking about it. It’s also a matter of following its logic and asking questions about what each type of data really represents and how different fields relate to one another (and often to what’s missing).

Wellington makes some cogent suggestions. Here are the summary points, but you ought to take a look at his post:

  • The TLC (Taxi and Limousine Commission) should fix the in-cab payment systems to make them consistent with one another.
  • The TLC should release taxi data directly to the public, not through FOIL.
  • When doing data science, look at the raw data.
  • Link to or publish your data sources.

All excellent suggestions. Kudos to Ben Wellington. 

And, by the way, check this post of Wellington’s as well. If you’ve ever had a NYC MTA Metrocard, the value of which you did not fully exhaust, this is an excellent example of how Open Data can lead to very concrete suggestions that save the public a lot of money. The value of this one is even greater than from tips.

Comments on this entry are closed.

Previous post:

Next post: