Blog

Boundaries, not barriers

Choroleth map of Scotland

I wrote some recent articles about the state of open data in Scotland. Those highlighted the poor current provision and set out some thoughts on how to improve the situation. This post is about a concrete example of the impact of government doing things poorly.

Ennui: a great spur to experimentation

As the Christmas ticked by I started to get restless. Rather than watch a third rerun of Elf, I decided I wanted to practice some new skills in mapping data: specifically how to make Choropleth Maps. Rather than slavishly follow some online tutorials and show unemployment per US state, I thought it would be more interesting to plot some data for Scotland’s 32 local authorities.

Where to get the council boundaries?

If you search Google for “boundary data Scottish Local Authorities”  you will be taken to this page on the data.gov.uk website. It is titled “Scottish Local Authority Areas”  and the description explains the background to local government boundaries in Scotland. The publisher of the data is the Scottish Government Spatial Data Infrastructure (SDI). Had I started on their home page, which is far from user-friendly, and filtered and searched, I would have eventually been taken back to the page on the data.gov.uk data portal.

The latter page offers a link to “Download via OS OpenData” which sounds encouraging.

Download via OS Open Data
Download via OS Open Data

This takes you to a page headed, alarmingly, “Order OS Open Data.” After some lengthy text (which warns that DVDs will take about 28 days to arrive but that downloads will normally arrive within an hour), there then follows a list of fifteen data sets to choose. The Boundary Line option looked most appropriate after reading descriptions.

This was described as being in a proprietary ERSI shapefile format, and being 754Mb of files, with another version in the also proprietary Mapinfo format. Importantly, there was no option for downloading data for Scotland only, which I wanted. In order to download it, I had to give some minimal details, and complete a captcha. On completion, I got the message, “Your email containing download links may take up to 2 hours to arrive.”

There was a very welcome message at the foot of the page: “OS OpenData products are free under the Open Government Licence.” This linked not to the usual National Archives definition, but to a page on the OS site itself with some extra, but non-onerous reminders.

Once the link arrived (actually within a few minutes) I then clicked to download the data as a Zip file. Thankfully, I have a reasonably fast connection, and within a few minutes I received and unzipped twelve sets of 4 files each, which now took up 1.13GB on my hard drive.

Partial directory listing of downloaded files
Partial directory listing of downloaded files

Two sets of files looked relevant: scotland_and_wales_region.shp and scotland_and_wales_const_region.shp. I couldn’t work out what the differences were in these, and it wasn’t clear why Wales data is also bundled with Scotland – but these looked useful.

Wrong data in the wrong format

My first challenge was that I didn’t want Shapefiles, but these were the only thing on offer, it appeared. The tutorials I was going to follow and adapt used a library called Folium, which called for data as GeoJson, which is a neutral, lightweight and human readable file format.

I needed to find a way to check the contents of the Shapefiles: were they even the ones I wanted? If so, then perhaps I could convert them in some way.

To check the shapefile contents, I settled on a library called GeoPandas. One after the other I loaded scotland_and_wales_region.shp and scotland_and_wales_const_region.shp. After viewing the data in tabular form, I could see that these are not what I was looking for.

So, I searched again on the Scottish Spatial Infrastructure and found this page. It has a Download link at the top right. I must have missed that.

SSI Download Link
SSI Download Link

But when you click on Download it  turns out to be a download of the metadata associated with the data, not the data files. Clicking Download link via OS Open Data, further down page, takes you back to the very same link, above.

I did further searching. It appeared that the Scottish Local Government Boundary Commission offered data for wards within councils but not the councils’ own boundaries themselves. For admin boundaries, there were links to OS’ Boundary Line site where I was confronted by same choices as earlier.

Eventually, through frustration I started to check the others of the twelve previously-downloaded Boundary Line data sets and found there was a shape file called “district_borough_unitary_region.shp” On inspection in GeoPandas it appeared that this was what I needed – despite Scottish Local Authorities being neither districts nor boroughs – except that it contained all local authority boundaries for the UK – some 380 (not just the 32 that I needed).

Converting the data

Having downloaded the data I then had to find a way to convert it from Shapefile to Geojson (adapting some code I had discovered on StackOverflow) then subset the data to throw away almost 350 of the 380 boundaries. This was a two stage process: use a conversion script to read in Shapefiles, process and spit out Geojson; write some code to read in the Geojson, covert it to a python dictionary, match elements against a list of Scottish LAs, then write the subset of boundaries back out as a geojson text file.

Code to convert shapefiles to geojson
Code to convert shapefiles to geojson

Using the Geojson to create a choropleth map

I’ll spare the details here, but I then spent many, many hours trying to get the Geojson which I had generated to work with the Folium library. Eventually it dawned on me that while the converted Geojson looked ok, in fact it was not correct. The conversion routine was not producing the correct Geojson.

Another source

Having returned to this about 10 days after my first attempts, and done more hunting around (surely someone else had tried to use Scottish LAs as geojson!) I discovered that Martin Crowley had republished on Github boundaries for UK Administrations as Geojson. This was something that had intended to do for myself later, once I had working conversions, since the OGL licence permits republishing with accreditation.

Had I had access to these two weeks ago, I could have used them. With the Scottish data downloaded as Geojson, producing a simple choropleth map as a test took less than ten minutes!

Choropleth of Scottish councils. Data Source: https://www2.gov.scot/Topics/Statistics/Browse/Local-Government-Finance/PubScottishLGFStats/SLGFS201617excel Contains OS data © Crown copyright and database right (2019)
Choropleth of Scottish councils. Data Source: https://www2.gov.scot/Topics/Statistics/Browse/Local-Government-Finance/PubScottishLGFStats/SLGFS201617excel Contains OS data © Crown copyright and database right (2019)

While there is some tidying to do on the scale of the key, and the shading, the general principle works very well. I will share the code for this in a future post.

Some questions

There is something decidedly user-unfriendly about the SDI approach which is reflective of the Scottish public sector at large when it comes to open data. This raises some specific, and some general questions.

  1. Why can’t the Scottish Government’s SDI team publish data themselves, as the OGL facilitates, rather than have a reliance on OS publishing?
  2. Why are boundary data, and by the looks of it other geographic data, published as ESRI GIS shapefiles or Mapinfo formats rather than the generally more-useable, and much-smaller, GeoJson format?
  3. Why can’t we have Scottish (and English, and Welsh) authority boundaries as individual downloads, rather than bundled as UK-level data, forcing the developer to download unnecessary files? I ended up with 1.13GB (and 48 files) of data instead of a single 8.1MB Scottish geojson file.
  4. What engagement with the wider data science / open community have SDI team done to establish how their data could be useful, useable and used?
  5. How do we, as the broader Open Data community share or signpost resources? Is it all down to government? Should we actively and routinely push things to Google Dataset Search? Had there been a place for me to look, then I would have found the GitHub repo of council boundaries in minutes, and been done in time to see the second half of Elf!

And finally

I am always up for a conversation about how we make open data work as it should in Scotland. If you want to make the right things happen, and need advice, or guidance, for your organisation, business or community, then we can help you. Please get in touch. You can find me here or here or fill in this contact form and we will respond promptly.

Response to Scotland’s Draft Action Plan on Open Government

The Scottish Government published its draft action plan on 14th November 2018. You can find it here. They are seeking feedback before the 27th November 2018.

Here is my feedback which I sent on 25th November.


 

Thank you for the chance to feed back on the drafts of the Scottish Open Government Action Plan and Commitments.

These documents are welcome and while they certainly set a path for moving Scotland further in the right direction in terms of openness and transparency, we should remember that those should not be our only aims. We need to ensure that we also address the need to use data and information to fuel innovation, and deliver societal and economic benefits for Scotland.

I have set out below my observations and suggestions in a number of areas which range from the general to the specific.

The public good

Data and information held by the Scottish Government and the public sector should be considered a Public Good. See https://www.nic.org.uk/wp-content/uploads/Data-for-the-Public-Good-NIC-Report.pdf and https://www.gov.uk/government/publications/data-for-the-public-good-government-response/government-response-to-data-for-the-public-good.

To deliver that public good requires freeing up information and data as a matter of course, rather than by exception.

There is one simple thing that could be done with immediate impact, and minimal effort, to free up large amounts of data and information for public re-use: adopt an Open Government Licence (OGL) for all published website information and data on the Scottish Government’s website(s), the only exception being where this cannot legally be done, as would be the case when personal data is involved.

The ICO’s own website (http://www.itspublicknowledge.info/home/TermsAndConditions.aspx) takes this approach: “Where the Commissioner is the copyright holder, information is available through the Open Government Licence. This means you have a worldwide, royalty-free, perpetual, non-exclusive licence to use the information, subject to important conditions set out in the licence.”

At present, websites operated by Scottish Government, local authorities, health boards etc.  all appear to have blanket copyright statements. I certainly could find no exception to that. With OGL-licensed content, where data is not yet available as Open Data (OD), a page published as HTML could be legitimately scraped and transformed to open data by third parties as the licence would permit that. Currently pages such as this list of planning applications, https://publicaccess.aberdeencity.gov.uk/online-applications/simpleSearchResults.do?action=firstPage contain valuable data but are caught by default, site-wide copyright statements.

Of course, in reality citizens, companies, universities and organisations do scrape website content, but it is done under the radar. This approach results in repeated scraping as the results are not published as open data, and there is consequently limited public benefit. Switching the licensing model to OGL by default, and copyright by exception,  would solve this and encourage both innovation and engagement: moving a supplier / consumer relationship to one where data and information are a shared public good.

The Scottish Government should mandate this approach not just for the whole of the public sector but also for companies performing contracts on behalf of Government, or who are in receipt of public funding or subsidy.

Targets for publishing

The Scottish Government’s own Open Data Strategy 2015 commits it to publishing data openly but despite my efforts and those of other contributors to it, the strategy mostly lacks hard targets, and sets overly-modest goals: “The ambitionis for all data by 2017 to be published in a format of 3* or above.”  One could ask if all of Scottish Government’s data wasactually published to 3* standard by the end of 2017. If not, how much? Who knows – is this even measured, reported on or published?

Therefore, any new action plan should have harder, more specific targets. It is arguable that the lack of these, and of a clear Open Data Policyfor Government, as I called for in 2015, allows overly-pressed civil servants to have much less focus on publishing open data than is needed, resulting in inadequate resources being applied to that. So, ideally this action plan should be underpinned by policy for the whole of the Scottish public sector to ensure that effort and resource can be targeted on publication.

To support this, the public benefits of open data publishing, both in social and economic terms, should be made clear to all data publishers.

Every FOI request should be assessed on receipt, identifying whether it is for data or whether data publishing would satisfy that and future similar requests. If so, the data set should be set for publication as OD with regular periodic updates.

Statutory obligations

I looked for, but could not see, in the action plan and other document, an acknowledgement  of the current statutory obligations on the Scottish Government in this area. Recognising, noting and commenting on these in the document would be a useful reminder of specific existing obligations but would also strengthen broader arguments for OD. The following list is not exhaustive.

There are obligations under the G8 Charter on Open Data https://www.gov.uk/government/publications/open-data-charter.

Further, there are existing clear obligations under The Re-use of Public Sector Information Regulations (2015) https://www.legislation.gov.uk/uksi/2015/1415/contents. There is a handy guide here:

http://www.nationalarchives.gov.uk/documents/information-management/psi–guidance-for-public-sector-bodies.pdf (see pages 22 onwards in particular).

Where specific legislation mandates open publication then this should be made clear, as is the case, for example, under The Public Services Reform (Scotland) (2010), if only to avoid this type of headline: https://www.heraldscotland.com/news/17238918.snp-ministers-missing-their-own-transparency-target/

Another example is the OECD’s “Compendium of good practices on the publication and reuse of open data for Anti-corruption across G20 countries: Towards data-driven public sector integrity and civic auditing”.

https://www.oecd.org/gov/digital-government/g20-oecd-compendium.pdf

Recommendations and best practice

There are many resources available online which demonstrate best practices which Scotland’s public sector should adopt in order to deliver the aims of the action plan. Again, these should be mandated for adoption in the action plan. Some examples follow.

Discoverability

A key part of publishing information and data openly is discoverability. To do this well means understanding and applying best practices. Having standard identifiers, descriptors, taxonomies etc. will aid discoverability.  So, all information and data publishing should use best practice, using the correct metadata and appropriate standards such as DCAT / DCAT-AP / DCAT.json.

There are some useful resources to assist in this such as

The Scottish Government has an internal expert on this, who sits on the international standards board. It is imperative that his input is sought, and implemented rigorously, in terms of this application of standards.

Data as infrastructure

We should acknowledge the concept of data as infrastructure. See https://www.nic.org.uk/wp-content/uploads/Data-As-Infrastructure.pdf and https://theodi.org/topic/data-infrastructure/. Publishing to our best ability, based on standards and best practice will allow new products and services be developed for societal and economic benefit, and support innovation.

Reference Data

By using standard identifiers for things, such as UPRNs for properties, USRNs for roads and so on, data from multiple government sources can be aggregated about that object, and we can link items with certainty. If the identifiers are then made public, external data such as those from the private sector, can be amalgamated. There must be a concerted effort to make these identifiers public and re-usable. Instead of what appears to be a starting position of “we can’t do this because of x ” we must shift to “how can we do this and how can we sweep away barriers?” Where no identifiers exist for a specific domain, but it is identified that there would be benefit from having them, these should be created.

General approach to open data

Open Data is not a separate thing or process. The curation, management and publication of data is a continuum starting with the internal processes of the organisation. OD should be seen as the natural end point for all data where it is appropriate to publish openly. By adopting an open data by default approach, as outlined here  https://en.wikipedia.org/wiki/Open_by_default effort is expended on publishing, not on finding a reason or way to publish: data will be published as OD unless there are specific legal reasons why it can’t be. There are additional benefits to this, including improvements in data quality, de-duplication  and re-use of data internally by other departments or services.

Further, while the draft action plan focuses on statistical data, it needs to be recognised that while publishing statistical data openly, the scope needs to be so much wider: encompassing all branches of the Scottish Government, its directorates, its NDPBs, and other agencies. SG also needs to act as a leader to health boards, local authorities, and to joint health and social care partnerships, and work with others such as Scottish Cities alliance where work is ongoing.

We need to open up reference data, geographical boundaries, transactional data, financial data, in fact anything that need not be closed by default.

National portal

Scotland lacks a national open data portal. While this is not a necessity, in order to aid discovery, it would be an advantage, particularly when we have a growing number of existing places where data is being published across Scotland. Many other countries have national portals (https://www.opendatasoft.com/a-comprehensive-list-of-all-open-data-portals-around-the-world/ ) and some such as Austria have had a federation model of publishing at various levels of government in place for many years. If we get discoverability right, and tools such as Google’s data search engine (https://www.google.com/publicdata/directory) begin to mature, this may be less of an issue.

Geospatial commission

Both the recently-formed geospatial commission and the rapidly changing stance of Ordnance Survey is going to impact on what we can publish – with barriers being removed. This increased liberalism will mean that data which we could not publish 3 months ago will suddenly be publishable. Scottish Government need to be on top of that and acting on it to push out data as soon as it can. Beyond that, they should be routinely pushing OS on issues such as derived data to ensure that barriers to publishing are actively removed. Similarly, if reference data is opened up at a UK level, then the Scottish portion of that data needs to be highlighted by the Scottish Government.

Community Building

The action plan must include commitments to work with the Open Data community in Scotland. It is smaller than it should be since there has been relatively little data of value to work with up to now. Contrast with the position of Transport For London, one single organisation, whose open data as far back 2013 was reported to be responsible for 5,000 developer jobs and 500 apps. The Scots Govt needs to grow the OD community and develop it by being an active part of it; to actively seek input on what data sets would be most useful, to use the community as a sounding board; to gain the trust and support of the community by empowering them to be infomediaries who will build and develop products and services which enable citizens to use the data produced, and make sense of it.

Supporting education

Finally, the publication of open data needs to be seen as an educational resource too. Data should be available for use by schools, colleges and universities. Curricular development should encompass the use of open data. Outreach should work with teachers and lecturers so that children can understand their locality by using data pertinent to them. Honours-year and post-grad students in computing sciences should use open data in their projects. Innovation and entrepreneurship courses should encourage the use of public data. Journalism courses should teach data journalism, and so on.

Ian Watt

etc

 

Open Data Scotland – a nudge from OD Camp?

Drawnalism captures open data sketch notes

Over the first weekend of November 2018, just over 100 people congregated in Aberdeen to attend the UK Open Data Camp. We’d pushed hard to bring it to Scotland, and specifically Aberdeen, for the first time. The event, the sixth of its type, which follows an unconference model where the attendees set the agenda, has previously taken place in England, Wales and Northern Ireland.

I’m not going to go through what we did over the weekend, you can find plenty of that here and here. There are links to all 44 sessions which took places on this Google doc, and many of those have collaborative notes taken during the sessions.

Instead this is a reflective piece, seeking to understand what OD Camp can show us about the state of Open Data in Scotland and beyond.

Who was there?

Of the 100+ attendees, including camp-makers, we estimate that about 40 were from the public sector. Getting exact numbers is hard – people register in their own name, with their own email addresses, but we think that is a good guess.

While this sounds good, during the pitching session on the first day Rory Gianni asked a question: “Hands up who is here from the Scottish public sector?” Two people’s hands went up out of 100+. Each were from local authorities, Aberdeen and Perth city councils, and a third (also from Aberdeen) joined later on Saturday.

This is really concerning and shows the gulf between what Scotland could, or rather must, be doing and what is actually happening.

The Scottish Public Sector

It is estimated that the Scottish Civil Service encompasses 16,000+ officers. It encompasses 33 directorates,  nine executive agencies  and around 90 Non-departmental public bodies (NDPBs) plus other odds and ends such as the Crown Office and Procurator Fiscal Service.

Then we have 14 health boards, 32 local authorities, 32 Joint Health and Social Care Partnerships and so on.

All of these should be producing open data.

Reality

Sadly, we are very far from that. Few are of any scale or quality. I’ve written about this extensively in the past including in this blog post and its successor post.

So, if we use attendance by the Scottish public sector, at a free-to-attend event which was arranged for them on their very doorstep, as a barometer of commitment to open data, it is clear that something is rotten in the state of Denmark Scotland.

Three weeks on

Since the event, I’ve reached out to the Scottish Government through two channels. I contacted the Roger Halliday, the Chief Statistician, the senior civil servant with a responsibility for Open data, and responded to a Twitter contact from Kate Forbes, the minister for Public Finance and Digital Economy.

I then had an hour-long conversation with Roger and two of his colleagues. This was a very positive discussion. I took away that there is a genuine commitment to doing things better, underpinned by a realism about capacity and capability to widely deliver publication and engagement with the wider OD community. I have agreed to be part of a round table meeting on OD to be held in the new year – and have expressed a commitment to assist in any way needed to improve things.

Meanwhile

Ironically, in the midst of this three week period, the Scottish Government published its Open Government action plan. This emerged on 14th November and is open for feedback until 27th November. So, if you are quick, you can respond to that – and I encourage you to do so. While this certainly seeks to move things in the right direction in terms of openness and transparency, it is extremely light on open data and committed actions to address some of the issues which I have raised.

My next blog post will be a copy of the feedback which I provide, and on which I am currently working.

And finally

When I started drafting this post I was in a very negative frame of mind as regards the Scottish Open Data scene – and particularly in terms of the public sector. In the intervening period, I  launched the Scottish Open Data Action group on Twitter. The thinking  behind this was to get together a group of activists to swell the public voice beyond mine and that of ODI Aberdeen.

Given the way things are moving on with the Scottish Government and the positive engagement that has begun, the group, which is in its infancy, may not be needed as a vocal pressure group. Instead we could be a supportive external panel who provide expertise and encouragement as needed. Who knows – let’s see!

Are Twitter closing the door on the educational sector?

Sorry - we are closed

Changes to Twitter’s terms of use for developers mean that universities, online tutors, and authors of instructional textbooks may no longer be able to teach students how to mine Twitter data in the way that they have done for several years. So how will the next generation learn how to use Twitter data? Ian Watt is concerned. 

If you’ve read any book on gathering Twitter data via the API it  is highly likely that you will get a fairly standard set of instructions, such as these from Python Social Media Analytics by Chatterjee and Krystyanczuk [PacktPub]:

  1. Create a Twitter account or use your existing one.
  2. Go to https://apps.twitter.com/ and log in with your account.
  3. Click on Create your app and submit your phone number. A valid phone number is required for the verification process. You can use your mobile phone number for one account only….

That’s how I learned it from books, and that’s how it was taught in our Data Science Course, but it appears that access to that method is now closed.

As computing science faculties around the world start to welcome a new intake of students, they need to face up to a difficult period ahead.

The Party’s Over

Towards the end of July 2018 I was completing my MSc Data Science project, a semester-long piece of work which relied heavily on access to Twitter’s API for its data. When I went to add an application to my Twitter account I got a nasty surprise!

If you visit https://apps.twitter.com/ now, you will find that you cannot create a new App unless you have a full developer account.

This was announced on 24th July, 2018,  and the developer documentation was updated at the same time.

“Starting July 24th, 2018, anyone who wants to create a new Twitter app will need to have an approved developer account. You can apply for a developer account at developer.twitter.com. Once your application has been approved, you’ll be able to create new apps on developer.twitter.com.”

It should be noted that pre-existing apps, created under existing accounts using the former method, will still work for now, but unless you are the owner of an approved developer account you will no longer be able to create new apps, and so will not be able to get authorisation tokens to run your new project.

As an experiment, I applied to have my account upgraded to a Developer one at the end of July to see how long it would take. As of today, 27th August, I am still waiting.

Why does it matter?

This change creates real challenges to Computer Science departments in universities and colleges world-wide.  If you teach big data, social media analytics, data mining, data science or similar, here are a bunch of questions for you:

  • How are your students going to learn how to use Twitter data now?
  • How will your R and Python Data Science courses teach how to use data which is no longer readily available to students?
  • If you continue to teach this, how long will it take students to obtain Developer accounts?
    • How can you guarantee that they will get them?
    • And what it they don’t?

And for developers, there are further issues ahead. The number of apps you can register are being reduced and the rate limiting is getting tighter in the next couple of weeks.

I understand why Twitter is doing this, and I respect their attempts to tackle real issues by removing bots, fake accounts etc. but, like with all big decisions, it does appear that there are unexpected consquences.

A final call to action

Who, in the academic community, has faced up to these issues? Is there a back-channel to Twitter to ensure that students can still be taught to use Twitter data responsibly? Who is lobbying Twitter on behalf of the educational sector?

We all need to raise these issues and ensure continuity of data science education.

Header photo by Tim Mossholder on Unsplash

Using Lego in Workshops

Lego play

At our co-design sessions we sometimes use Lego bricks with our workshops. What?!? I hear you saying. We’re not kids, why should we use Lego at professional events? Well, there’s lots of good reasons to use Lego bricks in workshops.

We use the Lego bricks as part of the Lego Serious Play process to tease out ideas from participants. After they’ve been thinking and writing on post it notes, having them build a few models to illustrate the combination of their ideas as a physical object makes it easier to share their ideas with others. Each person tells the story of their model to the others in their group, who then can ask questions about the model. As this is ‘just a model’ it doesn’t have to be perfect. It just has to capture the key qualities the person wants to emphasise. From this state we can also build on these ideas more with either more post its, or by building more models to represent more aspects of the issue and its context.

The key to this is the power of the Lego Serious Play approach, which include the following:

  1. It’s not words so it’s easier to capture the essence of the idea and develop it further.
  2. It’s visual and thus easier to share with others.
  3. It is easier to build on each other’s ideas.
  4. It is playful, and thus speeds up the ideation process, and is more memorable so you’ll remember more compared to using only written words in the session.

You can find out more about using the Lego Serious Play process in with CTC events, and about the LSP facilitation training  on my other blog. Get in touch too if you’d like us to use LSP in a workshop with you.