The Obama administration has been pretty big for data science. They appointed the country’s first official Chief Data Scientist, and launched data.gov, a clearinghouse for government generated public data sets. President Obama also signed an executive order declaring open and machine-readable formats as the new default for all forthcoming government information resources, guaranteeing the continuing availability of interesting data.
Beyond the practical use of generating actionable conclusions from public data, any repository of up-to-date, varied data sets is invaluable to beginners, looking to hone their skills, and build a portfolio of projects.
But self-directed amateur research can also present a significant roadblock. Exploratory analysis—essentially poking around in the data and learning interesting things—is important, but the real skill of an employable data scientist is the ability to frame and answer a well-defined question.
So, how do you come up with a good question, a problem to solve, that makes use of this public data? If you’re coming to data science from a relevant domain—economics, or education, or public health, etc.—maybe you have the expertise to do this already.
Personally, coming from a strictly technical background, there isn’t really a policy domain where I feel qualified to know what is an interesting question, at least in the scope of these large, federal studies. Fortunately for me, data can be found a little closer to home.
A growing number of municipalities treading the same path, and providing open data portals for their public information. The City of Tacoma’s portal is built by Socrata, who make these tools their business, and provides useful map views, filters, as well as plenty of raw data that can be exported to the analysis tool of your choice.
Here are a few data sets available from TacomaData:
- Tacoma Business Licenses – All the active licenses, their location, business type, and more.
- Pedestrian Crossing Improvements – All active and completed work to upgrade crosswalks and curb cuts, city-wide.
- Electric Vehicle Charging Stations – Location, cost and vendor.
I don’t know the first thing about interstate commerce, national education policy, trends in agriculture. But I know about business dead zones, about poorly designed intersections and awkward detours, and about gas prices. I know about Tacoma things. Maybe I’m not an expert, but I know enough to come up with questions.
So, if you’re struggling to concoct a research project out of all this free government data, I strongly encourage you to think local. Google “[my city] open data” and see what you find. And who knows? At the local level you might even get your findings into the hands of someone who can use them.