Open Data Roundtables Summary Report
In March, April and May the Treasury Board Secretariat held a series of public consultations about the Federal Governments Open Data Portal in preparation for its re-launch. Five consultations took place, in Toronto, Edmonton, Vancouver, Ottawa and Montreal. Stakeholders from academia, the software development space, non-profit sector, the technology sector, local and provincial government officials and others were invited and participated. During the five events 82 stakeholders participated in the consultations. In addition the Minister participated for the majority of the sessions in Edmonton and Toronto, and gave opening remarks at the session in Ottawa.
Each consultation lasted 2 hours during which 6 questions (see Appendix A) were asked. For the first two questions stakeholders were broken into small groups of four where they discussed and debated their thoughts. The groups then reassembled into a plenary session and shared their feedback. For the last four questions, answers were shared and discussed in a plenary session. In addition, question 3 was added only for the, Vancouver, Montreal and Ottawa consultations. Finally, while participants enjoyed discussing questions 1, 2 and 6, questions 4 and 5 received more muted feedback. Many had already contributed to the consultation on the licence and were more interested in talking about the data the government had to offer.
This report makes no effort to try to prioritize the needs of one stakeholder over the other. Rather it will try to reflect the conversation and recommendations made by the stakeholders noting both common interests, and points of divergences.
What follows is a distillation of this feedback.
While the conversation about how to improve the government’s open data portal was rich, and numerous recommendations are outlined in this report, there are five core recommendations that emerged in each session that were most often cited. These include:
- Establish Goals
Have a clear statement of purpose for Data.gc.ca that makes its purpose and mission clear to citizens, users and government officials.
- Improved Search
Stakeholders almost universally agreed that the search features on data.gc.ca need significant improvement. Better search, peer recommendations and tagging to facilitate data discovery.
- Metadata and Documentation
In order to make effective use of data, users need access to metadata and documentation so they can know what the data means. Participants wanted the government to include more documentation and adhere to metadata standards wherever possible.
- Community Building, Engagement & Education
Develop a community engagement strategy to connect and engage data users. In addition, build better tools to inform users of changes and updates.
- The Data
While stakeholders indicated a desire for more of all types of data some broad themes stood out. Postal code data and other “core” datasets that serve as connective tissue in mash ups and analysis are particularly important. In addition users expressed an interest in having more data be geographically tagged. There was also a desire for data to be as granular as possible, as well as, longitudinal so that longterm trends could be identified.
Recommendations from the Groups
1. Establish a Goal
- Have a clear statement of purpose for Data.gc.ca that makes its purpose and mission clear to citizens, users and government officials. This document should be used to help prioritize
- the allocation of resources (e.g. “community engagement and training” versus “targeted disclosure to advance policy goals “ versus “improving the metadata of existing datasets”).
- The different and sometimes competing demands, different user groups will have of an open data portal.
What was said:
While the questions served as a useful prompt to get the stakeholders talking about the government’s data portal they almost inevitably lead to some a large discussion about two intertwined challenges that were discussed explicitly or implicitly in each of the sessions: What is the goal of the data.gc.ca and who is the audience?
As the stakeholders dug deeper and deeper into the questions confronting Treasury Board they came to respect the magnitude of the challenge around creating the next federal government open data portal. The diversity of roles and goals of the stakeholders who participated in the events reflect the diverse needs and potential goals the open data portal could, or may even need, to serve. Transparency advocates sought data about government accountability, academics cared about provenance, metadata and context, software developers and business sectors cared about data that would provide business intelligence or front line service information while local and provincial counterparts were interested in exploring how data from different jurisdictions could be rolled up and shared more effectively. And this is to say nothing of numerous other stakeholder groups, sub-groups or even the many Canadians who will come to the portal with little sense of purpose or potentially experience in working with data.
That said, the stakeholders themselves often acknowledged that it may not be possible to please all users equally. As a result many expressed a desire for the government to be more explicit about its goals for its open data portal. Is the portal’s primary goal to make government more transparent? Improve access to data that engages the policy or academic spheres? To make it possible for more vendors to design products either for citizens, or government, to improve services and/or increase competition in the vendor space? Or is the goal to help foster a data literate citizenry better able to think critically and participate in a 21st century economy? It may end up being all of these goals – but any effort to devise a new data portal should at least grapple with this question so that both citizens, users and the government officials who implement the portal have clarity about its purpose and therefore may be both better able to use it, evaluate it and, in the case of the latter group, design, support and build a strategy around it.
2. Improved Search
- Improve the search feature of any future site, make it easier for people to find their data
- Consider having multiple portals (or at least ensure continuation of current portals) that tend to focus on specific issues (eg. geographic data, statistical data or environmental data) and speak to critical stakeholder groups such as academics, policy makers at local, provincial and federal level.
- help users create their own ways of navigating datasets through tags
- engage librarians in the design and organizing of data
- explore possibilities of enabling search across data jurisdictions so that comparable datasets held at the local and provincial level are also displayed
What was said:
The ability to find government data is one, if not the, core responsibility of an open data portal. Without effective search, users and citizens will never be able to find what they need or are curious to know more about. Reducing the friction cost around finding data is no guarantee of broad and wide usage, but high friction costs will guarantee little or no use of government data.
Almost without fail the single greatest complaint about the Federal Government’s Open Data Portal was the quality of its search. Participants talked of searching for datasets they knew existed but could not find without typing an exact phrase or knowing a key term. Connecting users with the data they are looking for is the primary purpose of a data portal, as a result getting this right is imperative. In addition, if the government continues to follow the UK and US and make more data available this will only increase in importance as the number, type and diversity of datasets increase.
Numerous other search-like related features were mentioned during the consultations. Many users wished that better “recommendations” appeared. E.g. users who found one dataset would also likely find these corresponding datasets to be of interest. Others suggested that both official and user generated tags be applied to datasets so that users can sort through terms that may be more accessible to lay people and other non-experts. Indeed one Toronto participant claimed that on a library website user generated tags were more widely used to navigate the website than the formal cataloging structure.
Several users discussed how many datasets on data.gc.ca could be found on other, more focused governmental data portals (such as GeoGratis). Because these portals tend to target a more focused audience they often provide both more effective search (false positives of unrelated data don’t clutter up results) as well as metadata and other information in a form more readily understood by the target (albeit usually specially skilled) group. There are two interests here. One was a clear desire for thematic search. The other was, again, recognition that a number of best practices cited here were already being practiced in these more focused portals. There was real concern that a single government open data portal would end these more targeted open data sites, many of which have both formal and informal user communities built around them.
Finally, if search is a critical role to the success of a data portal, stakeholders were in virtually universal agreement that the government engage the expertise of data librarians in the creation and organizing of any next generation open data portal.
As an illustration of how hard it can be to get data in Canada that exists one Montreal stakeholder told a great story of how he needed railway routes to create a timetable application. He initially approached VIA which claimed it did not have the data and pointed him to CN. CN claimed it wasn’t sure if it had the data. In the end the developer was able to secure the data from an American government agency which had rail routes for all of North America.
3. Metadata and Documentation
- Adopt metadata standards as quickly as possible
- Enable users to add metadata where missing from datasets
- Metadata may need to engage a range of users from deep subject matter experts to laypeople.
What Was Said
Metadata and documentation is information about the data. What it means, how it was measured and collected. Both effective and responsible use of data requires good metadata, particularly when users may attempt to apply it for a purpose it was not originally intended.
Almost all stakeholders agreed that having better metadata was critical to drive use. However, it was clear there were differing opinions about what metadata meant to different stakeholders.
Those within the academic, policy and librarian community had a stronger sense of specific requirements they wanted adhered to (such as the Dublin core metadata standard). Whereas developers and others – while not uninterested in such metadata - were also interested in information that might be more immediately accessible to the layperson. Documentation about what the data was collected for was desired by most participants, as was, critically, contact information for the government employee who created and/or served as the custodian of the dataset. The ability to connect directly with the data source – the person responsible – was seen as critical by many stakeholders from across all sectors.
There was also wide recognition that metadata can be hard to create – particularly after a dataset has been created. Many stakeholders strongly urged the government to enable users to add missing metadata much like the Library of Congress has done with pictures (users can tag photos helping improve the catalog). Such metadata could come with a warning that it is user created and therefore less reliable – but it would nonetheless be better than nothing and could help engage new users of the site.
4. Community Building, Engagement & Education
- Draft a community building strategy and ensure it is resourced
- Invite data creators and users across government, and outside government to write about their experiences on the open data portal
- Make processes around requesting data more transparent and inform the user of timelines for responses
- Post currently requested datasets and previously rejected ones so the community members can not requests duplicates and upvote
- Go to where the users are and/or create user friendly spaces where people can congregate. There are a number of mailing lists where data users “hang out” and many more that are highly specialized (e.g. say GIS users or housing experts)
- Create communities around specific datasets or even categories of data.
- Consider an education function to the website, including the provision of tools so that all Canadians can work with and learn to use, public data.
Engagement and community building, while related, speak to two separate desires expressed by stakeholders. Engagement revolved principally around the website to close feedback loops – both from the open program to users and vice-versa. Community referred to the ability to connect with both the open data program managers, as well as, other users with the goal of asking and answering questions, learning about data uses, building skills and even simply getting inspired.
While many users appreciated the efforts made to date by the team running the open data portal most agreed that there was significant opportunity to both improve engagement – which is not seen as strong - as well as build community – which is broadly perceived as non-existent.
Many users reported that frequently feedback provided on the website never received acknowledgment of receipt or a response – particularly when requesting a dataset. Some stakeholders reported that this caused users to lose confidence in the site and other features that required their input. Most agreed that when users are solicited for feedback or requests for information are made, it would be nice to have a confirmation of receipt or acknowledgment, as well as a sense of timeline around when a response would be forthcoming. Not one stakeholder requested immediate responses – just greater awareness of where they were within a process.
As part of the above conversation there was also a desire to be able to log errors with datasets so that they could be acknowledged and, if appropriate, resolved. More advanced users talked about a desire to be able to fork (duplicate) datasets and then share these forks back with the government for other users.
There was also a sense of frustration that the portal did little to keep users up to date, on basic issues like changes to a dataset. A data user is likely going to be very interested any time a dataset changes and there was much interest in some sort of notification system for when a dataset is updated or becomes disputed.
Officials from provincial governments – particularly British Columbia - shared their experience around the need to build community to foster data use. Many users cited British Columbia’s website, which features regular blog posts highlighting new datasets available. While others referenced the City of Vancouver’s mailing list that alerted subscribers about new datasets. In addition, a community created Google group, used by data users in Vancouver, but in which municipal staff regularly participated by answering questions and even occasionally asking for advice, was cited as a great example of effective community engagement. So was a provincial open data Google group that enjoys similar participation from provincial public servants.
Stakeholders were deeply appreciative that government employees participated in these groups directly – answering questions and addressing requests promptly without having to move through a communications officer. Such groups built both community and trust. Many stakeholders felt that community building and engagement would require dedicated resource(s) and could not be an afterthought of any open data program. Certainly the examples of the United Kingdom, the United States and British Columbia were widely cited, as these jurisdictions have staff dedicated to engaging with, and cultivating users.
Many stakeholders from across the sectors felt that the open data portal provided the government with an important opportunity to raise awareness of, and even provide training in support of, data literacy. In some sessions stakeholders pointed out their hope that there might be an Open data 101 or master classes to help explain how to use datasets in a variety of ways. There was a general acknowledgement that those participating in the roundtables were not everyday Canadians, but those already interested and, often, using, data. It was felt there may be a need to showcase what is possible to those less familiar with using data.
However, while there is clearly a need for more education, the government will need to decide if, and to what degree, this is one of the core goals of the portal. Participants generally felt more was better from both a social equity and digital economy perspective. In addition, education and community engagement may actually help lower the costs of managing the open data portal, as the capacity of current users could be leveraged to help one another. As well new users could potentially lighten the demand on government resources. As such whether explicitly or implicitly, there will be pressure on the government to include an educational element in its open data programming.
5. The Data
- Further ascertain what are “core” datasets and prioritize their release
- Make it easier for data to be organized by geography – both by releasing geographic “shapes” that are relevant to users – as well as by organizing data along these geographic boundaries.
- Create a canonical dataset of data held by the government so that users can better prioritize data they wish released and better understand why some datasets cannot be released.
- Consider the open data portal an archival point as well as distribution point for data as interest will exist in not only recent but also historical data.
- For large datasets, enable users to download a sample set they can play with.
What Was Said:
While virtually all stakeholders had visited data.gc.ca at some point, only between 50-70% had ever downloaded a dataset and a smaller number, say 15-20% had actually used a dataset in an analysis or application. Some of this was a result of an inability to find interesting datasets. So what did users want? Here they were less clear, as interest was as varied as the participants. However some core themes did emerge.
- Core Datasets
- Users identified some datasets as “keystone” datasets. These data are often essential in the use of other datasets as they serve as unique identifiers for various objects. Since these datasets are often the building blocks of analysis and applications it was widely agreed that making them open and available should be a priority.
- Among these datasets were: Postal Codes, Non-Profit and Corporate Registry databases, boundary data such as health authorities and other jurisdictional data. In addition, some datasets, such as government spending and budget data was seen not as core per se, but none the less distinctly important.
- Organize by Geography
- Connecting data to geography is one of the most popular uses, both for analysis and application development.Many users noted that presently there is no order by which data is organized across the federal government – no canonical reference map that would make it easy to mash up say, population and population density for a given area or region. The closest infrastructure to exist in this way is the census track which is frequently a very abstract area for most users. They are more interested in community or city level datasets. Helping the government define some common geographies that might help make its data more accessible and interoperable was mentioned several times by users from all sectors.
- Users noted that such a project – which would be helpful within the federal governments structure – would also be helpful if it reached beyond the federal government. Many relevant datasets, such as health authorities, school districts, etc… reside at the provincial or even local level. The federal government could take a convening role is trying to establish standards in this space.
- Longitudinal Data
- In addition to organizing data by geography there was a marked interest in longitudinal data. This interest was particular strong in the non-profit, government, academic and policy sectors. For many stakeholders the line between “current” and “archived” data was often blurry with a desire to be able to access any date range seen as highly desirable.
- Data of the data
- Another dataset that virtually everyone agreed would be helpful was a master data file about all data the government collects. There are currently three types of data:
- Data the government can share;
- Data, that it cannot share because of privacy or security reasons; and,
- Data that it won’t disclose it possesses for reasons of security.
- Stakeholders wanted a list of the first two. They were quite clear that they understood this included data they would never be privy too (for reasons of privacy or security) but knowing of its existence would help prioritize data requests and reduce the likelihood of asking for datasets over and over that cannot be released.In addition, stakeholders interested in privacy were particularly interested in the metadata about datasets that could not be made open, as they want to know what governments are collecting, even if they don't want to see the actual data. i.e. they want to know what is collected to see how the government is dealing with privacy.
- Finally, some developers expressed an interest in an API for a dataset of what is on the catalog since this would make creating alternative portals or search engines possible. At a minimum, expert users might be able to comb through the data more effectively.
- Another dataset that virtually everyone agreed would be helpful was a master data file about all data the government collects. There are currently three types of data:
- The more Granular the better
- All the stakeholders were concerned about protecting individuals privacy and, as long as this could be maintained there was a real desire for as granular data as possible. With many datasets there is no dilemma around privacy or security, and so there was real frustration expressed about the lack of granularity available. Take for example something such as wheat production, knowing the bushels produced by Canada in a year is not as interesting as knowing the numbers produced by each province, but it is much more interesting knowing production by region and even more so, by sub-region, temperate zone, etc…. All too often a “dataset” is simply an aggregation of what is much more granular data. What stakeholders want access to is that more granular data, that is where the rich analysis and application development opportunities lie.
6. Additional Recommendations
- There was strong support for a common pan-Canadian licence. All stakeholders agreed that having municipalities, provinces and the federal government using the same licence would make it easier to re-use data.
- Consider developing a core set of user types or “personas” – developers, researchers and average citizen. Focus on the top three – develop strategies and design concepts that would allow them to quickly accomplish their goals.
- If the desire is to attract non-specialist users, there should be a careful vetting of the site and datasets of acronyms, which create barriers to learning. The work of the UKs GDS should be seen as a model here.
- The government needs to consider how it will handle very large datasets. Particularly those that change frequently. One stakeholder talked about how a government authority repeatedly sends him CD-ROMs of datasets as this is an easier or more desirable way to share the data than making it available online.
- There could be a role for the federal government to foster and support various data standards, while such a program would be more comprehensive than simple tools on a website, such a toolset might be able to spur vendors, government officials and other actors to propose and champion standards that would enable communities to be better served.
- The demand for APIs was very mixed. Some developers and researchers expressed a strong desire while others had a preference for a bulk data download. However, many stakeholders agreed that, as the government gives data away, it is getting harder and harder to know who is using the data and for what purpose, making it difficult to share success stories and offer examples. API’s might offer a way to identify users and, with permission, better understand their use cases and share the success.
- A number of current actions by governments and agencies raised concerns from virtually all stakeholders at all the meetings, these included:
- Canada Posts lawsuit over Postal Code data. In addition to being a source of frustration and concern to researches many developers and private industry raised concerns about the “chilling effect” it was having on their desire to innovate with GC open data
- At four of the five meetings large numbers of stakeholders raised concerns about the termination of the mandatory long form census.
Appendix A: Questions asked each group
Not every group was asked every question since during some consultations time ran out.
- The Government of Canada will be releasing the next-generation open data portal this spring. From a user perspective, what are some of the specific features you would most want to see available to you via the portal?
- Who has used data from data.gc.ca and what have you done with it?
- What specific data, either already available or not yet available, are of most interest to you?
- What are some criteria for open datasets that would be valuable?
- Beyond adding datasets and incorporating interactive features on the open data portal, what else can the federal government do in its delivery of open data that would facilitate app and usable analysis development?
- What social media tools should the Government of Canada use to reach open data users?
- Work is underway to finalize a common Open Government Licence for municipal, provincial, territorial and federal data. What are your thoughts or suggestions on the importance of effective licensing on open data activities?
- Are you familiar with the proposed licence? How effective do you think it will be in supporting the use of datasets? (across multiple jurisdictions follow up).
- Are there other open data standards that should be prioritized to help you as an open data user and app developer?
Appendix B: Datasets requested
What follows in an unfiltered list of datasets that were requested:
- Homeless families and individuals systems
- Land use data
- How traffic habits have changed - Feds to gather regional data
- Forestry data
- Health data
- Critical issue around access to data
- Generational shifts
- Transportation movement of goods across Canada – like to see this Transport bought this from Stats Can
- Voluntary sector
- Administrative datasets ... help for seniors
- Immigration data
- Natural resources data
- Building and construction – building activities
- Labour force survey- more frequent than every 5 years
- CIC landings data
- Postal Codes data
- Walking patterns for postal delivery routes – i.e. most efficient routes can help politicians, girl guide cookies, etc…
- Federal financial transactions within 48 of transaction taking place
- Datasets of all datasets (including those not available for release on the open data site for reasons of privacy or security – knowing they exist could cut down on useless requests)
- CMHC - housing data
- Immigration and location, per age, country of origin - where they go, what lifestyle do they have
- Where are people moving in the country
- Transit from City to countryside - lifestyle changes
- Base map of Canada - zoom down on neighbourhood data
- Federal Grant Data
- Land Classification – come from AG, Forestry, etc- bring them all together.
- API for data catalogue – would be great to do this across all jurisdictions.
- CRA data – bringing it down to Census track level
- Budget data
- Corporate registry data
- Elections data – turnout by poll, GIS by poll, campaign contributions
- High School Drop out rates
- City - bike routes,
- Infrastructures (roads, walkable, navigable, etc ways)
The Open Data Roundtables Summary Report was developed by David Eaves.