Over the last couple of months I have been working on getting some non-personal data sets published in re-usable form (as recommended in the Power of Information report here). This all a part of the digital engagement strategy, but something close to my heart because of Rewired State and Young Rewired State. I have already seen the huge potential of making this data available: the Rewired State creations page showcases what can be made for next to no money over the course of 12 – 24 hours – imagine the potential?
Not only does it mean that awesome web and mobile apps are created – but it also means that the people who actually need to use this information, can create what they need, in a way that answers the problem they are trying to solve – traditionally this would have taken weeks/months of customer insight, and the solution then developed by industry experts. I really like the approach of looking to the digital community, to find the people facing whichever problem an organisation is trying to solve – and then finding those who also have the ability to create solutions, the geeks, the coders. Young people are an obvious example, and we have proved the success of that; but it can also be applied to say those with long term illness – a group that I know the Scottish government is trying to reach and help.
I thought that it might be useful to explain how we in the Home Office have been approaching responding to the recommendation: ‘The government should ensure that public information data sets are easy to find and use’, and what we plan to do next.
Taking the simple remit to locate the non-personal data, find the original source and publish that, we began by looking at all of our publications. Most roads led to the Research Development and Statistics unit. So we – in communications and the Office of the Chief Information Office (OCIO) – started to talk to the statisticians about getting access to the raw data. Because we did not really know what we were asking, and RDS did not really know why or what we were asking either, we had a series of telephone conversations, email conversations and finally a good old coffee and a chat. (I tell you what, this is what I love about this work – you get to meet the most incredible people. I had not any idea of the work of the statisticians and I am in awe, and a little bit in love, with what they do). At the end of that, we had a clear understanding of the process of data being analysed and released, the varying degrees of complexity and statistical implications of disclosure (which basically means if we go to too much granular detail, there can be a chance that individuals or locations could be identified <- that’s very bad). Now we all knew what we were asking for, we had an idea of what we wanted to do and so we began to do it.
We have separated data into two high level categories: data that is currently published, and data that is yet to be created.
For data that is currently published we are working closely with the statisticians to get the raw data and we are now publishing it here www.homeoffice.gov.uk/data (published by the rather wonderful Carly Moore in e-comms). (You will see that we also link to PDFs that have data in, just so that you can see what will be coming up). It is working, and we are looking at how we can make this all better: easier to sort and find datasets.
For data that is yet to be created, we are talking to the relevant parts of the Home Office about the data that is required, and will be publishing this on an ongoing basis. We are also preparing guidance to enable officials to produce future data in a format and to standards that will facilitate its reuse. In the longer term we aim to establish a process whereby data is published in reusable form as a matter of course, and is made available promptly, whilst maintaining appropriate controls regarding the security of personal or sensitive data (in accordance with the Hannigan report).
So that’s how we are handling this. Does that help? I hope you will keep an eye on how this progresses. If you do anything with any of the reusable data, do tag it #honpdata and then we will be able to see what you do.
I would seriously love to have a developer session where the statisticians and coders work together – that would be alchemy.