Interview with David Stephenson – the Public Data Guy
April 1, 2010
I caught up with W. David Stephenson, a guy who’s very excited about government open-data projects, as he was on his way to see a publisher about his next book on the same topic. One of his recent speeches was entitled, “Let my Data Go: the Case for Transparent Government.” You can find him at Stephenson Strategies. I was excited to ask him about open data for government. Here is our interview.
What democratizing data efforts have you worked on?
I got a chance to be in on the ground floor of democratizing data, when Vivek Kundra was the District of Columbia’s CTO. He had already established their “Citywide Data Warehouse,” which released nearly 300 data streams, many of them on a real-time basis. That was the model for the data.gov site that he launched when he became the US CIO. I was brought in to help design a strategy to make the District even more transparent, and, most important, to make these data streams really helpful to people.
Tell us what trends you’ve seen in data over the past few years – what’s getting traction?
The President’s Open Government Directive, which set aggressive timetables for agencies to release a variety of “high value” data sets through the new Data.gov site, is likely to lead to valuable new location-based services (LBS) and other exciting uses. Data.gov grew from 47 data sets when it was launched in May to more than 1000,000 by late July. Clearly, people are interested in sharing data. This circumvents years of consultants and one-off projects to integrate two separate systems. Now anyone can come get it and use it however he or she likes.
It’s also important to note this is not just a US initiative: there is friendly competition among a number of national governments such as the UK, Australia, and New Zealand to release data, and individual cities such as Vancouver and Seattle are following the District of Columbia’s lead. The great thing is that each can and does learn from the other, and it is easier and easier for late comers to get up to speed.
What’s the difference between cloud computing and linked data?
They go hand-in-hand, but unless you are using linked data you simply won’t realize the full potential of cloud computing. You can share all sorts of unstructured data and other content via cloud computing, but it’s only when the data is linked that we realize the full power of the cloud.
What are the biggest sources of confusion you see out there?
No PR! People don’t realize how much they could save and how much they can do with linked data. For example, because most of the use of the eXtensible Business Reporting Language (XBRL) has been for regulatory filings, companies haven’t realized this is a license to save money, especially if they use XBRL Global Ledger (GL), which can link all sorts of internal operating data (and to seamlessly flow into those reports as well). For the first time, that would allow them to share data with everyone who needed it, on a real-time basis. Unfortunately, most companies farm out the XBRL reporting to their accountants, which means they have an extra expense, and realize no benefits. What a waste!
What do you think people need to focus on that they are not?
Most important, for organizations, is internal use of linked data: even when government agencies and companies are sharing their data externally, they aren’t realizing its full potential to improve internal operations.
Which US cities are most progressive? Are there any good models for government data?
There are several cities that have done a lot: Washington DC (because of Kundra’s leadership), Seattle, Vancouver (which you mention in your book), and San Francisco are all good models for opening their data. However, my favorite is Manor, TX. Manor is a small (under 6,000) suburb of Austin. Their CIO, Dustin Haisler, is only 23, and has a laughably small IT budget. However, he has attracted global attention because of his innovative policies, most notably his use of Quick Response (QR) Codes — open source bar codes that the city uses to track all of its real property and to share information with citizens. Just hold a cell phone in front of one of the codes, and all sorts of relevant structured data about the building project, historic building or other property is immediately available.
Can you tell us which other countries are most progressive in opening data?
Probably the country that has attracted the most attention, because of the star power behind it, is the UK, where Web inventor Sir Tim Berners-Lee has advised the Brown government on creation of Data.gov.uk. When I spoke on the subject in New Zealand last spring, I learned their real-time traffic feeds have led to a variety of valuable services for motorists. Australia also has an extensive effort under way.
Do you think there should be a sort of UN of data? If so, what would it look like? Is there anything today that comes close?
Yes, I do, because this would allow countries and businesses all over the world to access and compare data for uses such as determining best practices on delivering services and how to turn data into new services. The best example today is the 30-nation Organisation (sic) for Economic Co-operation and Development, which not only publishes its data, but goes the extra step of posting it on two sites, Swivel.com and Many Eyes, which allow the general public to easily visualize and work with data. I think it’s noteworthy that back when I first learned about XML around 2000, the U.N. Development Programme was one of the prime backers of the initiative, I assume because of its potential for international development and trade.
What are the big things you see coming in the future? How will things look different five years from now?
I think the biggest change will be when there’s much more consciousness of the benefits of the Semantic Web (as you document so well in Pull!) and when data is routinely structured as soon as it is entered. When that happens, we will see unprecedented creativity and innovation, because of the infinite variety of individual interests and experiences: 10 people will see the same 20 data feeds and mash them up in 200 different ways because they’ll each identify different needs and benefits.
Cool. What dream projects would you do if you had the funding?
I’m convinced, as I wrote in an op-ed around tax time last year, that creating an easily customizable form modeled on the TurboTax interface to enter data, where business rules would operate invisibly to the user and tag the data the first time it was entered, would have huge benefits. If it was interactive, the user would be quizzed to make sure the data was accurate, then the tagged data would automatically flow everywhere — entered only that one time, but used everywhere.
What strategic advice would you give to people trying to figure all this stuff out? What resources are available to them?
Well, let me start with a little unpaid plug to buy “Pull.” Even though I’ve known the technology behind the Semantic Web for a long time, it was only when I read your book that I really “got it,” about how this would affect absolutely everything we do. Beyond that, there are several sources that I rely on for innovation in this field:
- tweets (I’ve compiled a Twitter List of my favorite “democratizing data” sources)
- Sunlight Foundation blog
- Data.gov blog
- Gov 2.0 Radio — great podcasts.
I rely on David to learn about innovation. Be sure to spend some time at his site, Stephenson Strategies.