Open Data 101
From what Open Data is, to understanding how to use it, this guide is a quick reference for everything you need to know about Canada's Open Data initiative.
Table of contents
Do you know how much of your tax money is spent on government contracts? Interested in determining the fuel consumption of a car you want to purchase? Would you like to know how many permanent resident visa applications were received in each province?
These questions, and many more, can be answered by looking at Open Data made available on this site. Open Data is a practice that makes machine-readable data freely available, easy to access, and most importantly, simple to reuse.
Technology has provided the capability to distribute large amounts of data and information through many different platforms on a vast array of subjects. The Government of Canada believes it is important to provide Canadians with access to the data that is produced, collected, and used by departments and agencies across the federal government. It is equally important that the data is made available through a single and searchable window. We have consolidated data from across government departments and agencies and have provided access to them all in one place.
In 2009, Open Data became visible in the mainstream, with various governments, such as the USA, UK, Canada and New Zealand, announcing new initiatives towards opening up public information. In March 2011, the Government of Canada (GC) launched its first-generation Open Data Portal, data.gc.ca, to support the delivery of GC Open Data in machine-readable formats.
In June 2013, the second-generation Open Government Portal was launched with additional functionality to highlight Open Information and Open Dialogue efforts in addition to Open Data.
Canada has now become an international Open Data leader, currently chairing an international Open Data Working Group through its involvement in the Open Government Partnership.
What is Open Data?
Open Data is defined as structured data that is machine-readable, freely shared, used and built on without restrictions.
The Open Definition provides a more detailed definition of Open Data. To summarize the most important points:
- Availability and Access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form.
- Re-use and Redistribution: the data must be provided under terms that permit re-use and redistribution including the intermixing with other datasets.
- Universal Participation: everyone must be able to use, re-use and redistribute. There should be no discrimination against fields of endeavour or against persons or groups. For example, 'non-commercial' restrictions that would prevent 'commercial' use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.
Canada's Open Data Principles
The Government of Canada has established the following Open Data principles based on the Sunlight Foundation's "Ten Principles for Opening up Government Information".
Datasets should be as complete as possible, reflecting the entirety of what is recorded about a particular subject. All raw information from a dataset should be released to the public, unless there are Access to Information or Privacy issues. Metadata that defines and explains the raw data should be included, along with explanations for how the data was calculated.
Datasets should come from a primary source. This includes the original information collected by the Government of Canada and available details on how the data was collected. Public dissemination will allow users to verify that information was collected properly and recorded accurately.
Datasets released by the Government of Canada should be made available to the public in a timely fashion. Whenever feasible, information collected by the Government of Canada should be released as quickly as it is gathered and collected. Priority should be given to data whose utility is time sensitive.
4. Ease of Physical and Electronic Access
5. Machine readability
Machines can handle certain kinds of inputs much better than others. Datasets released by the Government of Canada should be stored in widely-used file formats that easily lend themselves to machine processing (e.g. CSV, XML). These files should be accompanied by documentation related to the format and how to use it in relation to the data.
Non-discrimination refers to who can access data and how they must do so. Barriers to use of data can include registration or membership requirements. Datasets released by the Government of Canada should have as few barriers to use as possible. Non-discriminatory access to data should enable any person to access the data at any time without having to identify him/herself or provide any justification for doing so.
7. Use of Commonly Owned Standards
Commonly owned standards refer to who owns the format in which data is stored. For example, if only one company manufactures the program that can read a file where data is stored, access to that information is dependent upon use of that company's program. Sometimes that program is unavailable to the public at any cost, or is available, but for a fee. Removing this cost makes the data available to a wider pool of potential users. Datasets released by the Government of Canada should be in freely available file formats as often as possible.
The Government of Canada releases datasets under the Open Government Licence – Canada agreement. The licence is designed to increase openness and minimize restrictions on the use of the data.
The capability of finding information over time is referred to as permanence. For best use by the public, information made available online should remain online, with appropriate version-tracking and archiving over time.
10. Usage Costs
The Government of Canada releases the data on the Open Government site free of charge.
Uses of Open Data in Canada
In February 2013 over 900 developers, students, and open data enthusiasts across Canada participated in the CODE, a 48-hour hackathon. The teams competed and built apps using datasets from Canada's Open Government Portal. Working under the theme of "Solving Problems and Increasing Productivity Through the use of Open Data".
Over 900 Canadians participated in this event and developed over 100 apps. To see what they developed, visit the Winner's Showcase.
For more examples of how open data is used within the Government of Canada and beyond, visit the Open Government Apps Gallery.
If you're interested in what Government of Canada's most popular datasets, the top visits by province, which department was the last to upload new data, and more analytics, visit Open Government analytics.
Benefits of Open Data
Support for innovation - Access to knowledge resources in the form of data supports innovation in the private sector by reducing duplication and promoting reuse of existing resources. The availability of data in machine-readable form allows for creative mash-ups that can be used to analyze markets, predict trends and requirements, and direct businesses in their strategic investment decisions.
Advancing the government's accountability and democratic reform – increased access to government data and information provides the public with greater insight into government activities, service delivery, and use of tax dollars.
Leveraging public sector information to develop consumer and commercial products - Open and unrestricted access to scientific data for public interest purposes, particularly statistical, scientific, geographical, and environmental information, maximizes its use and value, and the reuse of existing data in commercial applications improves time-to-market for businesses.
Better use of existing investment in broadband and community information infrastructure - Canada has invested in information and communications networks in the form of technical infrastructure and community services, such as libraries and social service agencies. This investment will continue to add value-for-money for Canadians by extending Web technology from one-way communications medium to collaborative environment.
Support for research - Access to federal research data supports evidence-based primary research in Canadian and international academic, public sector, and industry-based research communities. Access to collections of data, reports, publications, and artifacts held in federal institutions allows for the use of these collections by researchers.
Support informed decisions for consumers - Providing access to public sector service information to support informed decision-making; for example, real-time air travel statistics can help travelers to choose an airline and understand the factors that can lead to flight delays. Giving Canadians their say in decisions that affect them and the resulting potential for innovation and value (builds trust and credibility)
Proactive Disclosure – proactively providing data that is relevant to Canadians reduces the amount of access to information requests, e-mail campaigns and media inquiries. This greatly reduces the administrative cost and burden associated with responding to such inquiries.
Examples exist for most of these areas.
Open Data and You!
Think you are ready to use open data? There's no better time than the present. To learn more about how to work with open data now that you have downloaded it, visit Working With Data and Application Programming Interfaces.
If you can't find the data that you're looking for, let us know and Suggest a dataset and we'll do our best to publish it in a timely manner.
- Reinterpretable representations of information in a formalized manner suitable for communication, interpretation, or processing.
- In a form that can be used and understood by a computer.
- Open data
- Structured data that is machine-readable, freely shared, used and built on without restrictions.
- Open government
- A governing culture that holds that the public has the right to access the documents and proceedings of government to allow for greater openness, accountability, and engagement.
- Open information
- Unstructured information that is freely shared without restrictions.
- Structured information
- Digital information residing in fixed fields within a repository.
- Unstructured information
- Digital information that is often created in free-form text using common desktop applications such as e-mail, word-processing, or presentation applications.