If you’ve read any book on gathering Twitter data via the API it is highly likely that you will get a fairly standard set of instructions, such as these from Python Social Media Analytics by Chatterjee and Krystyanczuk [PacktPub]:
Create a Twitter account or use your existing one.
Click on Create your app and submit your phone number. A valid phone number is required for the verification process. You can use your mobile phone number for one account only….
That’s how I learned it from books, and that’s how it was taught in our Data Science Course, but it appears that access to that method is now closed.
As computing science faculties around the world start to welcome a new intake of students, they need to face up to a difficult period ahead.
The Party’s Over
Towards the end of July 2018 I was completing my MSc Data Science project, a semester-long piece of work which relied heavily on access to Twitter’s API for its data. When I went to add an application to my Twitter account I got a nasty surprise!
If you visit https://apps.twitter.com/ now, you will find that you cannot create a new App unless you have a full developer account.
“Starting July 24th, 2018, anyone who wants to create a new Twitter app will need to have an approved developer account. You can apply for a developer account at developer.twitter.com. Once your application has been approved, you’ll be able to create new apps on developer.twitter.com.”
It should be noted that pre-existing apps, created under existing accounts using the former method, will still work for now, but unless you are the owner of an approved developer account you will no longer be able to create new apps, and so will not be able to get authorisation tokens to run your new project.
As an experiment, I applied to have my account upgraded to a Developer one at the end of July to see how long it would take. As of today, 27th August, I am still waiting.
Why does it matter?
This change creates real challenges to Computer Science departments in universities and colleges world-wide. If you teach big data, social media analytics, data mining, data science or similar, here are a bunch of questions for you:
How are your students going to learn how to use Twitter data now?
How will your R and Python Data Science courses teach how to use data which is no longer readily available to students?
If you continue to teach this, how long will it take students to obtain Developer accounts?
How can you guarantee that they will get them?
And what it they don’t?
And for developers, there are further issues ahead. The number of apps you can register are being reduced and the rate limiting is getting tighter in the next couple of weeks.
I understand why Twitter is doing this, and I respect their attempts to tackle real issues by removing bots, fake accounts etc. but, like with all big decisions, it does appear that there are unexpected consquences.
A final call to action
Who, in the academic community, has faced up to these issues? Is there a back-channel to Twitter to ensure that students can still be taught to use Twitter data responsibly? Who is lobbying Twitter on behalf of the educational sector?
We all need to raise these issues and ensure continuity of data science education.
At our co-design sessions we sometimes use Lego bricks with our workshops. What?!? I hear you saying. We’re not kids, why should we use Lego at professional events? Well, there’s lots of good reasons to use Lego bricks in workshops.
We use the Lego bricks as part of the Lego Serious Play process to tease out ideas from participants. After they’ve been thinking and writing on post it notes, having them build a few models to illustrate the combination of their ideas as a physical object makes it easier to share their ideas with others. Each person tells the story of their model to the others in their group, who then can ask questions about the model. As this is ‘just a model’ it doesn’t have to be perfect. It just has to capture the key qualities the person wants to emphasise. From this state we can also build on these ideas more with either more post its, or by building more models to represent more aspects of the issue and its context.
The key to this is the power of the Lego Serious Play approach, which include the following:
It’s not words so it’s easier to capture the essence of the idea and develop it further.
It’s visual and thus easier to share with others.
It is easier to build on each other’s ideas.
It is playful, and thus speeds up the ideation process, and is more memorable so you’ll remember more compared to using only written words in the session.
There is an oft-repeated joke in which a tourist, completely lost in the Irish countryside, asks an old fellow who is leaning on a gate at the edge of a field, “Can you tell me how to get to Dublin?” After a long pause, the old guy replies, “Well, you don’t want to start from here”.
Previously, I covered open data in Scotland from 2010 to the present. Now I look ahead, but to get there we need to start from where we currently find ourselves.
Scottish open data publishing – now
Earlier this week I spent a couple of hours pulling this list together as a first snapshot of the current open data publishing landscape. The intention is to present an accurate precis of the current state, within the available time to do the research. If I have missed anything, or got it wrong, let me know and I will fix it.
There have been sporadic attempts – of varying size, cost, and success – to make Scottish open data available. How these were initiated or funded varies. Examples include bodies such as Nesta, individual local authorities, groups such as the Scottish Cities Alliance (SCA), and by the Scottish Government.
It appears that at the time of writing that the SCA programme (which is scheduled to run from Jan 2017 to Dec 2018) has so far delivered new open data portals for Dundee, Perth, Inverness and Stirling. Some of these have started to publish a few data sets and others, 18 months into the programme, are still waiting to do so. Aberdeen, who dropped out in late 2017, announced in May of this year that they were back on board, but so far there is no sign of anything being delivered. Even the open data landing pages which Aberdeen City Council once hosted have been removed, although I have heard mention of some GIS open data due to be released.
Edinburgh and Glasgow had existing portals prior to the SCA programme. In Edinburgh’s case, while it has an impressive 234 datasets, only four of these have been updated in the last six months, and no new data sets added for over 15 months.
It looks like Glasgow’s open data platform is a new one, replacing the one created as part of the TSB funded £20m+ future cities programme (PDF. Links to original site have disappeared). It used to host over 370 data sets. The new one has far fewer: 72 . While a number of these have been added to the new portal this year, many of them are historic: e.g. house sales data only go to 2013, which suggests that these are ported from the old site and not updated. It also suggests that around 300 data sets have vanished (temporarily, we hope)!
Some considerable recent attention, and an award, has been given to a project carried out on business rates data by North Lanarkshire Council (NLC) with partners Snook and Urban Tide. This is part of a programme funded by the ODI, and the press coverage reiterates NLC’s claim to have an open by default policy. I know both Urban Tide and Snook, and their work – so I am sure that it will be great. In researching this, though, I could find no data.
In response to my enquiries NLC told me that they are testing a platform. Interestingly, Edinburgh has claimed in the past to have an open-by-default policy for data too, which I cannot locate. Sadly this position is not supported by their own portal’s current condition.
Similarly, Renfrewshire have an Open Data in Renfrewshire page, “The Council is taking a lead role in complying with the Scottish Government’s Open Data Strategy“, the Dublin Code data of which show it was created, and last updated in April 2016. They have a 25-page strategy dated 2015 with a commitment to open data by default, but NO open data that I can find; not even an entry in their website A-Z.
When we created the business case for the SCA data programme, I was quite clear that each of the 7 local authorities were procuring a portal for their city, not for the council. This is an important point. When local councils fail to provide a platform, and data, it is not just the local authority’s image it is tarnished – they are failing citizens, academia and businesses alike.
Where can we see best practice in action?
Sadly, the answer isn’t in Scottish local government, at least for now. Perhaps, when the SCA project reaches its conclusion in December, there will more to show for it. Let us hope.
It also has its Scottish Spatial Data Infrastructure hub which presents geospatial data for both local authority and Scottish Government. This is a welcome resource but is not without its challenges. I’ve not found a way to search by licence (as it appears that not all data is licensed for reuse) and some of the data formats (e.g. WMS or WFS) are more suited to other specialists rather than the general public.
If you know of other high quality examples which I have missed, please let me know.
What stops publishers doing better?
I have had many conversations about this over the years. Since I wrote part one of this mini-series several people contacted me with their thoughts about the Scottish Public Sector’s approach to OD.
Issues which get in the way of doing it right (in no particular order) include:
Lack of awareness (or deliberate ignoring) of legal commitments to provide the data
No open data policy, so it is easy to not do it.
No organisational commitment
A lack of understanding by, and therefore no support from, senior managers / elected members
Short term-ism. Too frequently, OD is delivered as a project, not a long-term commitment
No clear responsibility for OD, or the wrong people / roles with responsibility
Lack of awareness of benefits (to organisation, to economy, to society)
Lack of capacity or lack of skills
Lack of engagement with wider data community
Imagined barriers, or no drive to overcome them
Poor data management, and / or siloed structures within the organisation
Data hoarding by services (“data is power and I am not giving mine up”)
Legal restrictions on publishing (real or imagined)
I can’t deal with all of these in this post – and many are cultural, and need to be resolved by the organisations themselves, but I will address a few of these below. It should also be noted that the G8 Charter on Open Data from 2013, and the Scottish Government’s 2015 Open Data Strategy (PDF), mean that not publishing is simply not an option.
But, licensing …
While not all open data is geospatial, a significant proportion is, and particularly useful one at that. A common barrier which is raised when electing not to release geospatial data is the licensing restrictions imposed by Ordnance Survey. Sometimes these are genuine issues but on occasion these difficulties are either thrown up by over-cautious individuals or those who can’t be bothered to research and tackle them.
I do recognise that the issue is a complex one but it is worth comparing the likes of the Surrey Planning Hub which offers a developer-friendly API returning fully-geocoded planning application data for all local authorities in an entire county, with – for example – the Scottish Spatial Hub which hosts 27 amalgamated spatial datasets for the 32 councils. Only three of these are open data. If you try to download the Planning Application data (c.f. Surrey) you are asked for a authentication key. If you try to register for one you are informed that you can only do so if you work for a local authority.
If anyone can explain why Surrey and Hampshire Hub, and other English authorities such as Camden can offer downloads of planning open data, of this quality and Scotland can’t, I would love to hear that. At its heart I believe there a misunderstanding about the OS Licence for Derived Data and presumption to publish.
This recent blog post by Ben Proctor, based on work at OD Camp Belfast, gives as good a set of guidance, and some debunking of myths. His summary hits the nail on the head: “The vast majority of derived data based on OS information can just be published by public bodies under this ‘presumption to publish’.”
The vast majority of derived data based on OS information can just be published by public bodies under this ‘presumption to publish’.
An announcement last week by Ordnance Survey points in the direction of further openness and a more permissive licensing regime (see this post by Owen Boswarva) and this is ahead of the formation and work of the new Geospatial Commission (GC).
So, perceived licence issues will soon be no longer being a barrier behind which the mis-informed can shelter. If I were working in local government data, or in a Scottish Government directorate, I would be proactively planning now how I am going to start to publish it.
Of course, the issues are not just with with publication.
The aim of the Aberdeen meet-up is to create that city-region local data community: bringing together interested, engaged participants from academia, citizens, community groups, developers, councils, Scots Govt departments, private companies and others. Open data is a large part of that conversation as well as data science and other related topics.
Activity such as that should be happening in each of the seven cities, and across Scotland more generally. While it doesn’t have to be driven by the local council – ours wasn’t – it should open up a meaningful dialogue with authorities: demonstrating need, prioritising specific data, providing feedback, creating opportunities for data use, identifying data in others’ hands, providing advocacy etc.
When we created the Scottish Cities Alliances Open Data programme, one of the four planned work streams, which was well-funded, was the nurturing of local data communities. Our aim was to move from the position of council as provider, and citizen / developer as consumer, of data, to one of all interested parties working together. As I said in that piece, “Going beyond publication, the true value of open data will be realised in its re-use and in the innovative uses to which it is put. The SCA partners will work to develop city-region open data eco-systems where the public, third and private sectors collaborate to encourage data use, economic stimulation and creative approaches to solving civic challenges.”
Going beyond publication, the true value of open data will be realised in its re-use and in the innovative uses to which it is put. The SCA partners will work to develop city-region open data eco-systems where the public, third and private sectors collaborate to encourage data use, economic stimulation and creative approaches to solving civic challenges.
As an adjunct to the SCA programme I put forward a proposal in 2017 for funding of a Code For Scotland programme, based on our experience as part of Code For Europe 2014 (PDF). There was a general support for it, but it was put on hold at the time. Part of the idea behind that was to provide seed support for creating a grass-roots movement to work with data in each Scottish city. In the absence of that, or to complement it should it come about, we do need to create informal networks of open data groups across the country.
So, what’s missing?
I subscribe to the notion that data in public hands is a common asset – and should be treated as such: a concept sometimes referred to as a data commons. Getting to that position entails quite a change in thinking and action. A first step is to create open data, publishing that in a way that easily allows, or encourages, re-use, with clear permissive licensing.
Drawing from the points above, to achieve the potential offered by open data (and already realised in more progressive places) Scotland needs the following:
The Scottish Government, and its many branches, Local Government, Health Boards, and others must now demonstrate a commitment to publish open data. This should follow the Enschede model and implement an open-by-default data policy. This means having the policy formally adopted, published, and committed to by all managers and employees.
We need to stop seeing open data as a separate activity to an organisation’s other data governance. It is not. Open data can be regarded to some degree as a barometer of how well an organisation manages its data assets.
Government need to move beyond ‘build-it-and-they-will-come’ attitude to data publishing, and to work with all partners to make it usable, useful and used.
While publishing static open data at three-star level on the five star model is useful starting point, it is not in itself an end. We need
common standards such as DCAT to enable interoperability between data catalogues.
Collaboration is key – and organisations should band together to share some of the heavy lifting. This increases outcomes, improves standards and reduces local cost. We should bin the ‘not invented here mentality’ and look further afield for where work of high quality is taking place. We should share these best practices like this.
While we are on this topic, individual councils should abandon the “we’re special” mentality which surfaces far too often. All unitary authorities essentially provide the same bunch of services, and have the same core systems from few suppliers. Each would benefit from increased co-operation, collaboration and common approaches to data management and publication.
Academia needs to get behind the open data movement. Data Lab and its many partner universities should be actively involved in the Scottish open data eco-system. MSc programmes (and undergraduate courses) should
regularly use open data, and
teach how to make use of it,
show how to build new and innovative services,
encourage students to be advocates for open data, how to request it, and to act as an intermediary between the publisher and the citizen.
We should then extend that to school pupils – linking it to the curriculum, demonstrating how to use data, interpret and understand it, build with it.
Each local city region, at a minimum, should have an active open data group – and links between these should be encouraged. Funding for this core part of the eco-system should be seen by Scottish and Local government as an investment in the economic and social future of Scotland.
The whole is greater than the sum of the parts: recruiting and involving additional local partners, such as local businesses, to make their data open will significantly enhance what the data community can build or create.
We need more meet-ups, events, competitions, challenges, and opportunities for data scientists, coders, analysts to work with government data.
And what will you do?
As the old adage says, “If you are not part of the solution, you are part of the problem.” So my challenge to you is, whatever your role, what are you doing to bring this about?
For local government in particular, please stop boasting about what you are going to do. Do that thing whatever it is, make it live, publish the data, deliver that policy, live up to promises – then you can boast about it.
If you have a responsibility for data and you aren’t actively pushing for its release as open data then you are probably in the wrong job.
If you are a politician, or elected official, and you are not questioning why your organisation is not publishing open data and supporting its use then you should stop down, and let someone who understands this stand for your seat.
If you work in Economic Development, Community Development, Health, Social Care, Transport, Environmental Services or anything else and you aren’t supporting a movement which can positively impact on your area of specialism then your need rethink your commitment to that role.
If you find yourself justifying why you haven’t published, couldn’t get support, would have liked to but , didn’t get a budget, weren’t supported, ‘legal’ said no, the dog ate your data…. please stop. I have heard excuses from all quarters for the last eight years. No more, please.
If you are an academic and your course neither makes use of, nor champions, open data, then revise your course materials (they could probably do with a refresh anyway).
If you are a developer, citizen, journalist, analyst – whatever – and you are not part of a local data meet-up, join one. If there isn’t one, start one.
If your local authority isn’t publishing open data, ask them why: lobby councillors, use FOI, get in the press.
Stop waiting for others to make stuff happen!
My intention is to write a follow up to this section, with a more detailed list of suggestions, links to handy guides, useful publications etc.
I am always up for a conversation about this. If you want to make the right things happen, and need advice, or guidance, for your organisation, business or community, then we can help you. Please get in touch. You can find me here or here or fill in this contact form and we will respond promptly.
In this, the first of two posts, I look back over eight years of open data in Scotland, showing where ambition and intent mostly didn’t deliver as we hoped.
In the next part I will look forward, examining how we should rectify things, engage the right people, build on current foundations, and how we all can be involved in making it work as we hoped it would all those years ago.
Let our story begin
“The moon was low down, and there was just the glimmer of the false dawn that comes about an hour before the real one.” – Rudyard Kipling, Plain Tales from the Hills, 1888
The story to-date of Open Data in Scotland is one of multiple false dawns. Are we at last about to witness a real sunrise after so much misplaced hope?
At Data Lab‘s recent Innovation Week in Glasgow, I found myself among 115 other data science MSc students – some of the brightest and best in Scotland – working on seven different industry challenges. You can read more of how that went on my own blog. In this post I want to mention briefly one of the challenges, and the subsequent conversations which it stirred in the room, then on social media and even in email correspondence, then use that to illustrate my false dawn analogy.
The Innovation Week challenge was a simple one compared to some others, and was composed of two questions: “how might we analyse planning applications in light of biodiversity?”, and, “how might we evaluate the cumulative impact of planning applications across the 32 Scottish Local Authorities?”
These are, on the face of it, fairly easily answered. To make it even simpler, as part of the preparation for the innovation week, Data Lab, Snook and others had done some of the leg work for us. This included identifying the NBN Atlas system as one which contained over 219 million sightings of wildlife species, which could be queried easily and which provided open access to its data.
That should have been the difficult part. The other part, getting current and planning application data from the Scottish Local Authorities should have been the easier task – but it was far from it. In fact, in the context of the time available to us, it was impossible as we could find not a single council, of the 32, offering its planning data as open data. You can read more of the particulars of that on my earlier blog posts, above.
This is about the general – not the specific, so, for now, let us set some context to this, and perhaps see how we got to be this point.
The first false dawn.
We start in August 2010, when I was working in Aberdeen City Council. I’d been reading quite a bit about open data, and following what a few enlightened individuals, such as Chris Taggart were doing. It seemed to me so obvious that open data could deliver so much socially and economically – even if no formal studies had by then been published. So, since it was a no-brainer, I arranged for us to publish the first open data in Scotland – at least from a Scottish City Council.
The UK Coalition Government had, in 2010, put Open Data front and centre. They created http://data.gov.uk and mandated a transparency agenda for England and Wales which necessitated publishing Open Data for all LA transactions over £500.
At some point thereafter, in 2011-12 both Edinburgh and Glasgow councils started to produce some open data. Sally Kerr in Edinburgh became their champion – and began working with Ewan Klein in Edinburgh University to get things moving there. I can’t track the exact dates. If you can help me, please let me know and I will update this post.
In 2012 the Open Data Institute was founded by Nigel Shadbolt and Tim Berners-Lee, and from day one championed open data as a public good, stressing the need for effective governance models to protect it.
During 2012 and 2013 Aberdeen, Edinburgh and others started work with Nesta Scotland, run out of Dundee, by the inspirational Jackie McKenzie and her amazing team. They funded two collaborative programmes: Make It Local Scotland and Open Data Scotland.
The former had Aberdeen City Council using Linked Open Data (another leap forward) to create a citizen-driven alerts system for road travel disruption. This was built by Bill Roberts and his team at Swirrl – who have gone on to do more excellent work in this area.
Around mid 2013 Glasgow had received Technology Strategy Board funding for a future cities demonstrator was was recruiting people to work on its open data programme
The second Nesta programme, Open Data Scotland , saw two cities – Aberdeen and Edinburgh – work with two rural councils, East Lothian and Clackmannanshire. Crucially, it linked us all with the Code For Europe movement, and we were able to see at first-hand the amazing work being done in Amsterdam, Helsinki, Barcelona, Berlin and elsewhere. It felt that we were part of something bigger, and unstoppable.
And it gets real-er
In late 2014 the Scottish Government appeared to suddenly ‘get’ open data. They wanted a strategy – so they pulled a bunch of us together two write one. The group included Sally from Edinburgh and me – and the document was published in March 2015. I had pushed for it to have more teeth than it ended up having, and to commit to defined actions, putting an onus on departments and local government to deliver widely on this in a tight timescale.
It did include –
“To realise our vision and to meet the growing interest from users we encourage all organisations to have an Open Data publication plan in place and published on their website by December 2015. Organisations currently publishing data in a format which does not readily support re-use, should within their plan identify when the data will be made available in a more re-usable format. The ambition is for all data by 2017 to be published in a format of 3* or above.” I will come back to this later.
This MUST be it!
In 2016-2017 the Scottish Cities Alliance, supported by the European Regional Development Fund launched a programme: Scotland’s Eighth City – The Smart City. At its heart was data – and more specifically open data. The data project was to feature all seven of Scotland’s cities, working on four streams of work:
data engagement and
The perception was also at that time that the Scottish Government had taken its eye off the ball as regards open data. Little if anything had changed as a result of the 2015 strategy. By working together as 7 cities we could lead the way – and get the other 25 councils, and the Scottish Government themselves, not only to take notice, but also to work with us to put Open Data at the heart of Scottish public services.
The programme would run from Jan 2017 to Dec 2018. I was asked to lead it, which I was delighted to do – and remained involved in that way until I retired from Aberdeen City Council in June 2017.
At that point Aberdeen abandoned all commitment to open data and withdrew from the SCA programme. I have no first-hand knowledge of the SCA programme as it stands now.
Six False Dawns Later
So, after six false dawns what is the state of open data in Scotland: is it where we expected it to be? The short answer to that has be a resounding no.
Some of the developments which should have acted as beacons have been abandoned. The few open data portals we have are, with some newer exceptions, looking pretty neglected: data is incomplete or out of date. There is no national co-ordination of effort, no clear sets of guidance, no agreement on standards or terminologies, no technical co-ordination.
Activity, where it happens at all, is localised, and is more often than not grass-roots driven (which is not in itself a bad thing). In some cases local authorities are being shamed into reinstating their programmes by community groups.
The Scottish Government, with the exception of their SIMD Linked Data work, which was again built by Swirrl, and some statistical data, have produced shamefully little Open Data since their 2015 Strategy.
Despite a number of key players in the examples above still being around, in one role of another, and a growing body of evidence demonstrating ROI, there is strong evidence that Senior Managers, Elected Members and others don’t understand the socio-economic benefits that publishing open data can bring. This is particularly disturbing considering the shrinking budgets and the need to be more efficient and effective.
So, what now?
Given that we have witnessed these many false dawns, when will the real sunrise be? What will trigger that, and what can we each do to make it happen?
I know you’re tired of reading about GDPR already, so I’ll keep this brief.
There has been a huge amount written on this topic in the last few months, just look at the Google trends chart:
This can make GDPR seem like an avalanche that is about to hit, and I’ve heard a number of people talking about ‘delete to be on the safe side’ as an appropriate GDPR strategy. ‘
Better safe than sorry’ etc…
Now, for some businesses this may be appropriate, but if you are running a small non-profit or charity I’d urge you to think hard before going down the baby/bathwater route.
I can’t advise you on implementing full GDPR policies and procedures in this post, but I can give you a pointer towards one factor that is often overlooked, and which can help you avoid the baby/bathwater mistake.
Put a value on your data
If you look at all your data and ask a few simple questions you will have a more useful segmentation of the data you care about, the data you don’t care about, and the data that is so burdensome that you likely should go down the deletion route:
Is the data directly used in your service provision?
Do you use and update the data regularly?
Both of these questions can be answered on a sliding scale. Quickly sketch out the following grid with four boxes:
Now write every bundle of data into the box that best represents how it is used. You likely have a dataset of users to whom you send information. If this is a core part of the service you provide then that dataset goes in box 1, top right.
In contrast, you may have a dataset of everyone that ever served as a trustee for similar sized organisations in the UK. While that data is interesting, could be useful for networking, and perhaps took time to acquire – it’s likely not core to your operations and is infrequently used. That dataset goes in box 3.
Ignore sunk cost
It’s tempting to look at the cost of acquisition of a dataset in this context. “But we spent so long building that trustee list”. Resist this temptation. If the data has little or no utility to your core operations – it should go in box 3 or 4.
Assigning ‘data care’ resources
This process can dramatically reduce the volume of data that you hold by focusing on the core value of the data in terms of service provision. This can make the care and maintenance of the data more achievable.
Box 1 – Start here. Focus all your efforts on ensuring that you are fully compliant for this data, as you likely can’t operate without it.
Box 2 – In most cases you should treat this as Box 1. If you are truly resource constrained however, make sure everything in box 1 is handled as a priority before tackling box 2.
Box 3 – This is easy. There will be data in here that you simply don’t need. Delete it.
Box 4 – This is more tricky, and is likely a bigger question that simply one of data retention. If this data isn’t related to service delivery, but you use it a lot, what is it? This might be a sign that some of your operations need to be re-thought. Are you using resource on ‘nice to have’ services, rather than on core work?
It doesn’t eliminate the need to be GDPR compliant, but it likely simplifies the requirement, and protects against valuable data being lost or overlooked in the process.
Finally – a quick note on fundraising. I would count fundraising as a core service of the organisation, so use the same process as above for any donor datasets.