Get Your Open Data Project Going!
In the first part of this post we explored why companies should release open data. In this part, I’ll share some thoughts about how you might scope and plan an open data project.
Scope and Possibilities
One thing to consider is that an open data project may represent an opportunity to do more than simply publish some data. For example:
- take advantage of the overlap between publishing open data and how you respond to the requirements of the Freedom of Information Act;
- whilst working out how your databases and applications fit together, in terms of the data they store and the data flows between them, you could also review your security policies and access control mechanisms to provide assurance that all is well or to flag up areas which need improvement;
- you’ll gain an understanding of how effectively data is currently being utilised within your organisation, which could lead to projects to improve internal Management Information, or flag up some opportunities for process-improvement.
Once you’ve agreed the project’s scope, you will want to create a plan for the work. By all means start with some tasks and milestones but before you fire up MS Project or your organisation’s de facto reporting system, step back and consider exactly what your deliverables are. What is the true nature of the project? What needs to happen after it ends, to keep the open data agenda alive in your organisation?
The Nature of the Beast
You could treat the project as a feasibility study, to be followed by a review and planning next-steps. There may be benefit in starting the project in a slightly underground way, without too much fanfare. This way, your first meetings with your colleagues who create or use the data will help inform your approach when you decide to give the project a higher profile.
However ambitious the scope, the truth is that the first couple of months of the project will be a voyage into the unknown – how much so will depend upon how self-documenting the organisation already is and how centralised your systems are. So be careful about what you are committing to before you sufficiently understand the nature of the beast. Don’t assume that what initially sounds straightforward (find → understand → publish → maintain data) will be easy, until you’ve actually worked with a team to understand their systems, processes, business events and data.
Finding a Level
If you’re starting with a high-level system wide overview, you’ll have to balance the thoroughness of the analysis with the number and complexity of the systems in your organisation. If there are many interlinked systems and teams, understanding the key data flows between systems is essential to understanding the context and ownership of the data.
You start at a high level in order to help you prioritise where best to focus resources on more detailed analysis or to find the best place to start looking at how to extract suitable open data. It is often hard to stay at a high-level without getting drawn into the minutiae or following up interesting leads. It all depends on your stated approach and the scope of the project how you monitor progress and react to difficulties or slippage. What is certain is that once you start, you’ll be under pressure to find and publish some open data as soon as possible.
It can be helpful to think of an open data initiative as a programme or a set of sub-projects, rather than a single self-contained series of tasks with inter-dependencies all neatly arranged between a start and a finish date. Some aspects will be linear (e.g organise and run a workshop), others will repeat during the life of the project and then continue forever in some other form, perhaps becoming embedded in the day to day operations (e.g. analyse a system, find and publish data, maintain it). If you start by looking at the project in this modular manner it can help avoid creating overly complex and hard to maintain project plans.
Another way of handling the progression from high to detailed level is to create small, self-contained mini-projects or ‘sprints’. The starting point of a sprint might be “I can see that there is good data in here, some work is required to extract it” – with the deliverable being the published data. Identify potential sprints as you go along, and deal with them in a prioritised way. Be careful not to spread yourself or team across too many sprints at once, since that defeats the point of a sprint as something ‘do-able’ in a relatively short space of time (weeks rather than months). Occasionally you may hear about some new data and think that you might be able to publish it relatively easily, in which case you may decide to put other things on hold for a while to focus on that.
I hope this post has given you some ideas about how to scope and run an open data project. In the final part of this article, we’ll get into the data – what it is and what to do with it!
Blog post by Mike Davies (@dotlineform)
Mike is a Business Analyst at West Yorkshire Combined Authority (WYCA) which is the official government agency for transport across West Yorkshire. Get in touch with him on twitter @dotlineform
All views expressed in this article are Mike’s and do not necessarily reflect those of his employer WYCA.