ATIP Tools Tech Corner - Introduction to the ATIP Online Request Service and the use of artificial intelligence

Follow:

Twitter
RSS

APA	Open Government - Government of Canada. (2018, October 5). ATIP Tools Tech Corner - Introduction to the ATIP Online Request Service and the use of artificial intelligence. Retrieved from http://open.canada.ca/en/content/atip-tools-tech-corner-introduction-atip-online-request-service-and-use-artificial
MLA	“ATIP Tools Tech Corner - Introduction to the ATIP Online Request Service and the use of artificial intelligence.” Open Government - Government of Canada, 5 October 2018, http://open.canada.ca/en/content/atip-tools-tech-corner-introduction-atip-online-request-service-and-use-artificial

(Report an issue with citation)

Introduction and background

Welcome to the ATIP Tools Tech Corner, where information and updates about the new ATIP Online Request Service (AORS) is shared.

The AORS is a simple, centralized website that enables users to complete access to information and personal information requests and submit them to any of the institutions that are subject to the Government of Canada’s Access to Information Act and Privacy Act.

Onboarding institutions

The AORS went live October 2018 with 6 institutions. All institutions subject to the Access to Information Act and the Privacy Act will continue to be onboarded to the application.

What is “onboarding”?

"Onboarding" in this context means that the institutions will be set up to leverage all the features of the application, and users will be able to send initial access to information and privacy information requests to them through the application.

ATIP Online Request Service (AORS) onboarding data

The following is an update on the state of onboarding

ATIP Online Request Service (AORS) onboarding data
Onboarding target number:	265
Number of institutions onboarded to date:	198
Number of institutions in progress:	67

See detailed list of onboarded institutions to date

Administrative Tribunals Support Services of Canada
- Canada Agricultural Review Tribunal
- Canada Industrial Relations Board
- Canadian Cultural Property Export Review Board
- Canadian International Trade Tribunal Canadian International Trade Tribunal
- Competition Tribunal
- Human Rights Tribunal of Canada
- Public Servants Disclosure Protection Tribunal Canada
- Public Service Labour Relations and Employment Board
- Registry of the Specific Claims Tribunal of Canada
- Social Security Tribunal of Canada
- Transportation Appeal Tribunal of Canada
Agriculture and Agri-Food Canada
Asia-Pacific Foundation of Canada
Atlantic Canada Opportunities Agency
Atlantic Pilotage Authority Canada
British Columbia Treaty Commission
Canada Council for the Arts
Canada Deposit Insurance Corporation
Canada Development Investment Corporation
- Canada Eldor Inc.
- Canada Hibernia Holding Corporation
Canada Economic Development for Quebec Regions
Canada Energy Regulator
Canada Foundation for Innovation
Canada Mortgage and Housing Corporation
- Canada Housing Trust
Canada Post
- 2875039 Canada Limited
- 3906949 Canada Inc
Canada School of Public Service
Canada-Nova Scotia Offshore Petroleum Board
Canadian Centre for Occupational Health and Safety
Canadian Dairy Commission
Canadian Food Inspection Agency
Canadian Grain Commission
Canadian Heritage
Canadian Human Rights Commission
Canadian Institutes of Health Research
Canadian Museum of History
Canadian Northern Economic Development Agency
Canadian Nuclear Safety Commission
Canadian Race Relations Foundation
Canadian Radio-television and Telecommunications Commission
Canadian Space Agency
Canadian Transportation Agency
Civilian Review and Complaints Commission for the Royal Canadian Mounted Police
Communications Security Establishment Canada
Copyright Board Canada
Correctional Investigator Canada
Crown-Indigenous Relations and Northern Affairs Canada
Department of Finance Canada
Department of Justice Canada
Elections Canada
Environment and Climate Change Canada
Farm Products Council of Canada
Federal Bridge Corporation
- Seaway International Bridge Corporation
Federal Economic Development Agency for Southern Ontario
Federal Public Service Health Care Plan Administration Authority
Financial Consumer Agency of Canada
Financial Transactions and Reports Analysis Centre of Canada
First Nations Tax Commission
Fisheries and Oceans Canada
Global Affairs Canada
Gwich'in Land and Water Board
Halifax Port Authority
Hamilton port authority
Health Canada
Historic Sites and Monuments Board of Canada
Immigration and Refugee Board of Canada
Impact Assessment Agency
Indigenous Services Canada
Infrastructure Canada
Ingenium – Canada’s Museums of Science and Innovation
Innovation, Science and Economic Development Canada
Mackenzie Valley Land and Water Board
Military Grievances External Review Committee
Military Police Complaints Commission
Nanaimo Port Authority
National Battlefields Commission
National Film Board of Canada
National Research Council Canada
Natural Resources Canada
- Energy Supplies Allocation Board
Natural Sciences and Engineering Research Council
Northern Pipeline Agency Canada
Nunavut Impact Review Board
Nunavut Water Board
Office of the Administrator of the Fund for Railway Accidents Involving Designated Goods
Office of the Administrator of the Ship-source Oil Pollution Fund
Office of the Commissioner of Lobbying of Canada
Office of the Commissioner of Official Languages
Office of the Information Commissioner of Canada
Office of the Privacy Commissioner of Canada
Office of the Public Sector Integrity Commission of Canada
Office of the Superintendent of Financial Institutions Canada
Office of the Veterans Ombudsman
Pacific Pilotage Authority Canada
Parks Canada Agency
Parole Board of Canada
Patented Medicine Prices Review Board
Pierre Elliott Trudeau Foundation
Polar Knowledge Canada
Port Alberni Port Authority
Privy Council Office
Public Health Agency of Canada
Public Prosecution Service of Canada
Public Safety Canada
Public Sector Pension Investment Board
- 3Net Indy Holdings
- 3Net Indy Investments Inc.
- 7986386 Canada Inc.
- 8599963 Canada Inc.
- Argentia Private Investments
- AviAlliance Canada Inc.
- Indo-Infra Inc.
- Infra H20 GP Partners Inc.
- Infra H20 LP Partners Inc.
- Infra TM Investments Inc.
- Infra-PSP Canada Inc.
- Infra-PSP Credit Inc.
- Infra-PSP ECEF Inc.
- Infra-PSP Partners Inc.
- Ivory Private Investments Inc.
- Kings Island Private Investments Inc.
- Northern Fjord Holdings Inc.
- Port-aux-Choix Private Investments Inc.
- Potton Holdings Inc.
- PSP Capital Inc.
- PSP Finco Inc.
- PSP H2O FL GP INC.
- PSP Public Credit I Inc.
- PSP Public Credit Opportunities Inc.
- PSP Public Markets Inc.
- PSPIB Baltimore G.P. Inc.
- PSPIB Bromont Investments Inc.
- PSPIB Deep South Inc.
- PSPIB DevCol Inc.
- PSPIB Emerald Inc.
- PSPIB G.P. Finance Inc.
- PSPIB G.P. Inc.
- PSPIB G.P. Partners Inc.
- PSPIB Golden Range Cattle II Inc.
- PSPIB Golden Range Cattle Inc.
- PSPIB Homes Inc.
- PSPIB IRP60 Inc.
- PSPIB Michigan G.P. Inc.
- PSPIB Orchid Inc.
- PSPIB Paisas Inc.
- PSPIB Pennsylvania Investments Inc.
- PSPIB WEXFORD INVESTMENTS INC.
- PSPIB-Andes Inc.
- PSPIB-CCR Inc.
- PSPIB-Condor Inc.
- PSPIB-Eldorado Inc.
- PSPIB-LSF Inc.
- PSPIB-Newbury G.P. Inc.
- PSPIB-RE Finance Inc.
- PSPIB-RE Finance Partners II Inc.
- PSPIB-RE Finance Partners Inc.
- PSPIB-RE MANCHESTER INC.
- PSPIB-RE Partners II Inc.
- PSPIB-RE Partners Inc.
- PSPIB-RE UK Inc.
- PSPIB-SDL Inc.
- PSPIB-Star Inc.
- Red Isle Private Investments Inc.
- Revera Inc.
- Trinity Bay Private Investments Inc.
- VOP Investments Inc.
Public Service Commission of Canada
Public Services and Procurement Canada
RCMP External Review Committee
Royal Canadian Mint
Saguenay Port Authority
Sahtu Land and Water Board
Sahtu Land Use Planning Board
Security Intelligence Review Committee
Sept-Îles Port Authority
Shared Services Canada
Social Sciences and Humanities Research Council of Canada
St. John’s Port Authority
Statistics Canada
Sustainable Development Technology Canada
Telefilm Canada
Thunder Bay Port Authority
Toronto Port Authority
Transport Canada
Transportation Safety Board of Canada
Treasury Board of Canada Secretariat
Vancouver Fraser Port Authority
Veterans Affairs Canada
Veterans Review and Appeal Board Canada
Western Economic Diversification Canada
Windsor Port Authority
Windsor-Detroit Bridge Authority
Women and Gender Equality Canada
Yukon Surface Rights Board

Institutions onboarded from the IRCC ATIP Online Pilot

Institutions that have been hosted by the IRCC ATIP Online Pilot are being migrated to the AORS.

Institutions onboarded from IRCC ATIP Online Pilot
Total number of institutions previously available on the pilot:	33
Total number of institutions onboarded on AORS from the pilot:	27
Number of institutions in progress:	6

Artificial intelligence

In this update, we will explain how the ATIP Online Request Service is leveraging artificial intelligence (AI).

What is the impact of our use of artificial intelligence?

To assess the impact of our use of AI, we have used the Algorithmic Impact Assessment Tool.

Basically, what this tells us is that our use of AI has little socio-economic impact on citizens and little impact on government operations.

Using artificial intelligence

The search functionality provided on uses AI to improve user experience.

The first instance where AI is used is when searching for information that may have already been released in response to another request. The search results are based on information readily available on the Open Government website.

The second instance where AI is used is when helping to identify which institution may have the information pertaining to the request. The search will recommend institutions that are most suitable for the type of request. The data used to make this recommendation comes from the following locations:

Open Government summaries
departmental reports
"scraping" on government websites
institutions’ ATIP web pages
Government of Canada taxonomies
unified master data organization schema
Part III of Departmental Results Reports for the 2016 to 2017 fiscal year

How are we using AI

Ensuring that a web search finds all the correct documents can be a difficult task. The search system leverages machine learning to identify contextual and latent relationships that are more fundamental than keywords. To do this, the search looks at concepts and the relationship between past searches to improve result quality.

The searching system that was developed uses advanced natural language processing and machine learning techniques to enhance searches across multiple sources. This search solution will include websites, forums or anything that is publicly accessible. By going beyond simple word similarity and instead "understanding" the meaning of search terms, this solution can compare a user’s search needs to the corpus of documents in near real time, returning all relevant documents, or components of documents, that relate to a given search query or comparable document.

Synonyms, abbreviations and typos often mean that key documents go overlooked. By using advanced machine learning and natural language processing, the algorithm built is able to read an entire corpus of documents (such as an enterprise website, a course curriculum and the textbooks and related documents). After reading the documents, the AI search system is able to semantically "understand" the phrases and ideas, more so than in keyword matching.

AI algorithm

The following will give more technical information about the algorithm that was used:

category of algorithm used: natural language processing
models used: tf-idf (term frequency-inverse document frequency) and cosine similarity models

Tf-idf Improvements

Tf-idf is a method to score how related two pieces of text are. The input text from the user will be matched against all documents previously released under ATIP. Public documents that have the highest scores will be suggested to the user.

The tf-idf algorithm begins by counting the number of words in the request that are also present in each public document. This count is then divided by how common each of the matched words are. This division reduces noise and accounts for the fact that common words such as "Canada" are likely to match many documents, regardless of the ATIP request and, therefore, the fact that match isn’t as important. It is more valuable to know that a less common word is found in both the user submission and the publicly available document.

At its core, tf-idf is a word-matching algorithm. Similar words (not exact) in both the query and the documents will not register as a match with tf-idf alone. In fact, tf-idf is what powers many off-the-shelf search platforms, including Apache Solr, which ATIP has suggested does not return relevant results. Therefore, a number of improvements to the tf-idf algorithm had to be done.

Stemming

The most common way of improving tf-idf is to use a technique called stemming. Stemming is the process of simplifying a word to its "stem" or root. For example, the root word of "stemming" is "stem." If we reduce all words to their base and then look for matches, we will count two words such as "fishing" and "fisher" as matching. This technique works similarly in English and French.

Stop words

As we move through the content to reduce words to their stem, we can also remove stop words. A stop word is a common word that does not contribute meaning to the phase. For example, if we removed "the" and "a" from any sentence, we can still infer the general meaning. Removing stop words improves the speed of our algorithm and reduces false matches.

Word embeddings

Stemming is useful when two words share the same root. But often there are words that are practically the same but do not share a common root. For example, "Access to Information and Privacy" and "ATIP" have the exact same meaning but share no words in common. In order for tf-idf to register matches for similar words, we need a way of measuring the similarity, or distance, between any two words. For example, "kids" and "children" should be close together, and "sheep" and "lion" should be far apart. In order to measure the distance between words, we can use a method called word embedding.

Word embedding is a tool that converts a word to a vector. Typically, this vector has hundreds of dimensions. We tend to use word embeddings that have 100 to 300 dimensions. Despite having a high number of dimensions, we can calculate the distance between any two words the same way we calculate distance in a smaller number of dimensions.

To combine tf-idf and embeddings, we convert every word to a vector by way of an embedding. We then measure the distance between every word in the activity (source) and every word in the snippet of content (target). Words that are very close together are given a score close to 1 (or exactly 1 if they’re the same word), and words that are very far apart are given a score of 0. In this way, a word will be considered a match if the meaning of the word is similar.

Dimensionality reduction

A word embedding converts a single word to a vector of hundreds of numbers. This is done for all words in all publicly available government documents. Ultimately this generates a tremendous amount of data, and this data must be searched and analyzed with every ATIP request. We can reduce the amount of computation (and therefore increase search performance) by using an algorithm called singular value decomposition (SVD). In short, SVD can be used to compress the information in each document (and total data generated) while still retaining the information and search accuracy.

Language selection

Search