National Collision Database National Collision Database (NCDB) – a database containing all police-reported motor vehicle collisions on public roads in Canada. Selected variables (data elements) relating to fatal and injury collisions for the collisions from 1999 to the most recent available data. 2013-06-13 2017-01-06 Transport Canada open-ouvert@tbs-sct.gc.ca TransportRoadRoad SafetyCollisionVehicleVehicle Collisionscasualtycasualtiesfatalityfatalitiesinjuryinjuriestraffic collisiontraffic fatalitytraffic injurytraffic collisionstraffic fatalitiestraffic injuriesmotor vehicle National Collision Database 1999 to 2014CSV http://data.tc.gc.ca/extracts/NCDB_1999_to_2014.csv Data Dictionary NCDB DOC http://data.tc.gc.ca/extracts/Data_Dictionary_NCDB_EN.doc National Collision Database 1999 to 2014ZIP http://data.tc.gc.ca/extracts/NCDB_1999_to_2014.zip National Collision Database 2014CSV http://data.tc.gc.ca/extracts/ncdb_2014.csv National Collision Database 2014ZIP http://data.tc.gc.ca/extracts/NCDB_2012.zip

National Collision Database

National Collision Database (NCDB) – a database containing all police-reported motor vehicle collisions on public roads in Canada. Selected variables (data elements) relating to fatal and injury collisions for the collisions from 1999 to the most recent available data.

Resources

Resource Name Resource Type Format Language Links
National Collision Database 1999 to 2014 Dataset CSV English
French
Access
Data Dictionary NCDB Dataset DOC English
French
Access
National Collision Database 1999 to 2014 Dataset ZIP English
French
Access
National Collision Database 2014 Dataset CSV English
French
Access
National Collision Database 2014 Dataset ZIP English
French
Access

Comments (38)

Hi, the date column is missing, so we cannot drill-down the collisions by date. If we have at least the ISO week then we can somehow generate the calendar date. It will be worth to have this date. Thanks, Asparuh

Hello There: Apparently there is no data set or breakdown of data for collision by region/province. I would have thought this would be readily available since these reports were gathered by jurisdictions? Makes it a little hard for comparative analyses. For e.g. comparing driving behavior/social activities of <25year olds in BC and Ontario, cannot possibly be achieved using your data? Any help? Please?

Hi, thank you for your comment. I have forwarded your inquiry to the data owner for their response. Stay tuned! Momin, the Open Government team

I am a student at Queen's University and I am trying to gather data to find a relationship between Blood Alcohol Concentration and fatal car accidents in Canada. I was wondering if this data contained information on blood alcohol levels of the drivers or any information regarding BAC. If not, does anyone know where I would be able to access information like that for Canada? Thanks in advance!

I have realized that the number of records for year 2014 in the files {ALL DATA, 2014 export} differs, so that could be the reason for deviations of the report measures. Finally I have managed to generate the sequences, so I will rate this open data set to 4 of 5. It will be nice if we have more data like vehicle make/model and what/who caused the collision according to the police reports. Please find the script: -- GENERATE SEQUENCES FOR THE MEASURES (NUMBER OF {COLLISIONS, VEHICLES, PERSONS}) ALTER TABLE COLLISIONS ADD C_ID INT IDENTITY(1,1); ALTER TABLE COLLISIONS ADD COL_SEQ INT; ALTER TABLE COLLISIONS ADD VEH_SEQ INT; ALTER TABLE COLLISIONS ADD PER_SEQ INT; -- VEH_SEQ WITH TMP1 AS ( SELECT VEH_SEQ, DENSE_RANK() OVER (ORDER BY (C_ID-P_ID)) AS RNK FROM COLLISIONS WHERE 1=1 ) UPDATE TMP1 SET VEH_SEQ = RNK ; -- COL_SEQ WITH TMP2 AS ( SELECT COL_SEQ, DENSE_RANK() OVER (ORDER BY START_C_ID) AS RNK FROM ( SELECT C_ID, COL_SEQ, (SELECT MAX(C_ID) AS END_C_ID FROM [DBO].[COLLISIONS] X WHERE X.V_ID=1 AND X.P_ID=1 AND X.C_ID<=C.C_ID) AS START_C_ID FROM [DBO].[COLLISIONS] C ) AS B ) UPDATE TMP2 SET COL_SEQ = RNK ; -- PER_SEQ WITH TMP3 AS ( SELECT PER_SEQ, ROW_NUMBER() OVER (ORDER BY C_ID) AS RNK FROM COLLISIONS ) UPDATE TMP3 SET PER_SEQ = RNK ; -- VERIFY THAT ALL IS FINE (TEST CASES) -- TC1 (Display some records and obtain totals by analytic functions) SELECT C_ID AS COLLISION_RECORD_ID, V_ID AS VEHICLE_RECORD_ID, P_ID AS PERSON_RECORD_ID , C_YEAR, C_MNTH, C_WDAY, C_HOUR, C_SEV , COL_SEQ, VEH_SEQ, PER_SEQ -- COUNT(DISTINCT COL_SEQ) OVER () -- Msg 10759, Level 15, State 1, Line 1385 Use of DISTINCT is not allowed with the OVER clause. , DENSE_RANK() OVER(ORDER BY COL_SEQ) + DENSE_RANK() OVER(ORDER BY COL_SEQ DESC) -1 AS NUMBER_OF_DEADLY_COLLISIONS_FOR_2014_AUG_SUN_15H16H , DENSE_RANK() OVER(ORDER BY VEH_SEQ) + DENSE_RANK() OVER(ORDER BY VEH_SEQ DESC) -1 AS NUMBER_OF_VEHICLES_INVOLVED_FOR_2014_AUG_SUN_15H16H , DENSE_RANK() OVER(ORDER BY PER_SEQ) + DENSE_RANK() OVER(ORDER BY PER_SEQ DESC) -1 AS NUMBER_OF_PERSONS_INVOLVED_FOR_2014_AUG_SUN_15H16H FROM COLLISIONS WHERE 1=1 -- C_ID BETWEEN 3392246 AND 3894769 AND C_YEAR = 2014 AND C_MNTH = 8 AND C_WDAY = 7 AND C_HOUR = 16 AND C_SEV = 1 ORDER BY C_ID ; -- TC2 (Compare vs. NCDB ONLINE Reports for 2014) SELECT C_YEAR as 'YEAR' , count (distinct COL_SEQ) AS NUMBER_OF_DEADLY_COLLISIONS_FOR_2014_AUG_SUN_15H16H , count (distinct VEH_SEQ) AS NUMBER_OF_VEHICLES_INVOLVED_FOR_2014_AUG_SUN_15H16H , count (distinct PER_SEQ) AS NUMBER_OF_PERSONS_INVOLVED_FOR_2014_AUG_SUN_15H16H FROM COLLISIONS group by C_YEAR order by 1 ; -- TC3 (Measures for year 2014) select C_YEAR 'Collision Year', count(COL_SEQ) 'TOTAL_NUMBER_OF_COLLISIONS', count(VEH_SEQ) 'TOTAL_NUMBER_OF_VEHICLES_IN_COLLISIONS', count(COL_SEQ) 'TOTAL_NUMBER_OF_PERSONS_IN_COLLISIONS' from COLLISIONS group by C_YEAR ;

Hi, thank you for your comments. I have forwarded your feedback to my colleagues at Transport Canada for their collective information. Regards, Momin, the Open Government team.

Hi team, the V_ID and P_ID columns should be unique, otherwise the data is useless. I have partial luck by using this query, but it does not solve the reporting problem: -- SET THE COLLISION GROUP ID (SQL Server Syntax) alter table ncdb_2014 add COLLISION_GROUP_ID int; ALTER TABLE ncdb_2014 ADD C_ID INT IDENTITY(1,1); select top(100) * from ncdb_2014; ---- actually it is not fully correct, because the query use only P_ID, but not {V_ID, P_ID} to group the records for one and the same collision with tmp as ( select C_ID, (C_ID-P_ID) grp, C_YEAR, C_MNTH, C_WDAY, C_HOUR, V_ID, V_YEAR, P_ID, COLLISION_GROUP_ID, dense_rank() over (order by (C_ID-P_ID)) AS rnk from ncdb_2014 where 1=1 /* -- test case: and C_YEAR = 2014 and C_MNTH = '8' and C_WDAY=7 and C_HOUR=16 and C_SEV = 1 */ -- order by C_ID ) UPDATE tmp SET COLLISION_GROUP_ID = rnk ; -- (295566 row(s) affected) -- FATAL COLLISIONS REPORT select C_YEAR, C_SEV, count(distinct COLLISION_GROUP_ID) from COLLISIONS where 1=1 and C_YEAR = 2014 and C_SEV = 1 and P_ISEV = 3 group by C_YEAR, C_SEV order by 1,2 ; -- 1746 -- we have an error i.e. for y. 2014 -> 1746 vs. 1667 in the original report

Dear OC Team, Current datasets contains V_ID, P_ID columns, but those columns are not unique and this is big problem to create any reports. What is the SQL query to find the number of [collisions, cars involved, persons involved] per year (or some other dimension)? Could we have UNIQUE sequence numbers for the Cars, Persons and Collisions e.g. CAR_SEQ {1,2,..,100000,..}, PERSON_SEQ {1,2,..,100000,..}, COLLISION_SEQ {1,2,..,100000,..}? Thanks, Asparuh

I would like some clarification on the codes for the data set. Does P_AGE mean the age of the driver (when P_PSN is 11)? Also - can you indicate what is meant by Person Sequence Number (P_ID)? Thanks.

Hi Eric, thanks for your comment. We have forwarded your question to the data owner for their response. Stay tuned! Momin, the Open Government team

Indeed, I fail to see why we can't get the exact day of each accident, this would be VERY useful.

Hi Yves, we have contacted the data owner regarding this and will provide an update as soon as we can. Stay tuned! Momin, the Open Government team.

The collision level data includes the "year", the "month" and the "day of week", but there is no "day of month" why is that?

It looks like motorcycle drivers have improperly encoded field P_SAFE in many cases: I have tried to see statics on accidents involving motorcycles, to see how type of protective gear correlates with the severity of injury. Unfortunately, the information in the database seem to be incomplete. I made a subset of data, selecting the rows where (P_PSN=="11"P_PSN=="12"P_PSN=="96") & P_ISEV!="N" & P_ISEV!="U" & V_TYPE=="14" & P_SAFE!='UU' & P_SAFE!='QQ' & P_SAFE!='NN' & P_SEX!='N' & P_SEX!='U' And the summary of the resulting distribution of P_SAFE record shows following (replacing codes with values from the data dictionary): No safety device used or No child restraint used 2063 Safety device used or child restraint used 2282 Helmet worn For motorcyclists, bicyclists, snowmobilers, all-terrain vehicle riders 57773 Reflective clothing worn For motorcyclists, bicyclists, snowmobilers, all-terrain vehicle riders and pedestrians 37 Both helmet and reflective clothing used For motorcyclists, bicyclists, snowmobilers, all-terrain vehicle riders and pedestrians 0 Other safety device used 86 No safety device equipped e.g. buses 4468 Data element is not applicable e.g. dummy person record created for parked cars 0 Choice is other than the preceding values 36 Unknown e.g. applies to runaway cars 0 Jurisdiction does not provide this data element 0

When will you be adding 2014,2015 and 2016 data. This is an excellent data base. I would love to see GPS coordinates added.

James, This from the data holder: Transport Canada is in the process of updating the dataset to include up to and including 2014 data. More current data is not yet available in a consolidated and complete state. Please check back for updates at a later time. Karin - the open-ouvert team

will there be a way to search for vehicles themselves? ie search for the VIN to find out if a car was in an accident?

Hello, Thank you for your question. I have sent this to the data owner for clarification. Thanks, Tasha, the Open Government Team

In this day and age of open data (open gov), the lat/lon information should be made available. May police departments are starting to provide their crime data (not just collisions) via open data portals. Check out http://data.vancouver.ca/datacatalogue/crime-data.htm as an example.

Dave, Thanks for your comment. I have shared your views with the data holder for their consideration. Karin from the open-ouvert team

Thank you!

Hi Dave, The following response has been posted on behalf of the data owner: "Thank you for your comment. The National Collision Database is in the process of trying to collect and report on additional data elements including ‘Collision Location – Geocode’. However, at this time the jurisdictions responsible for collecting these data have not been able to collect the location data on a sufficiently complete basis, so we are unable to compile and release location data at this time." -Tasha, the Open Government Team

I notice you state Bilingual in the various formats of the dataset, but don't see any French values. Do you have a list of all elements with French values?

Fantastic Dataset. I echo the wish for Jurisdiction or provincial information but understand if that needs to be limited. Do you know if this will be updated past 2012? Thank you!

Hi Ray, the response from Transport Canada: "Thank you for your inquiry. At this time, the data are limited to national-level only. We are in the process of compiling the 2013 data so we expect to make these data available in the near future. Thank you."

To my understand from the previous reply, there are multiple rows representing same incidents(one collision will possible generate multiple rows due to multiple injuries. Is there a way to identify which multiple row represents the same incidents so i can summarize them into one?

294MB file size? Surely there's a better way to present this data.

This table is at the person level. this data is really hard to work with. I would really like to see the creation of a Incident level ID and a vehicle level ID so I can see how many vehicles/persons were involves in a collision. The IDs in the dataset are merely counts of number of people and vehicles.

Thank you for your comment. The following is a response from the data owner : Thank you for your comments. We will look into the feasibility of incorporating collision and vehicle level IDs into future iterations of this data product. In the meantime, please feel free to visit our online data tool at http://www.tc.gc.ca/VehicleCollisions. You could also contact Transport Canada’s Evaluation and Data Systems team directly via ncdb-bndc@tc.gc.ca

more clarification is needed about the data elements. medical fatalities versus collision severity fatalities is not clearly defined

Thank you for your comment. The following is a response from the data owner : If we understand the comment correctly, our response is that the difference between C_SEV and P_ISEV can be found in the data dictionary provided to the portal (and copied below). For clarification purposes, please note that C_SEV is at the collision level and that P_ISEV is at the person level. If there was something else, please re-phrase the comment/question. C_SEV Code Description 1 Collision producing at least one fatality 2 Collision producing non-fatal injury U Unknown X Jurisdiction does not provide this data element P_ISEV Code Description 1 No Injury 2 Injury 3 Fatality Died immediately or within the time limit. N Data element is not applicable e.g. “dummy" person record created for parked cars U Unknown e.g. applies to runaway cars X Jurisdiction does not provide this data element

What does each record in the dataset represent? Does it represent a car involved in the accident or a person involved? I mean if two cars collided, each carrying two people, will there be 4 records associated to that accident, or only two? I have ruled out the possibility that each record represents one collision.

Thank you for your comment. The following is a response from the data owner: The National Collision Database is a hierarchal dataset with three main segments containing collision-level, vehicle-level and person-level information. Each record represents a person involved. In the example provided, where 2 vehicles each have 2 persons, there would be 4 records.

It would also be great, if longitude and latitude are not available, to get some sense of where these collisions took place - even by province. Especially since motor vehicle licensing is by province, this way you could compare laws and policies with changes in collision rates over time.

The follow response has been posted on behalf of Transport Canada : "It has been determined by the jurisdictions that provide this data that province/territory is among a number of variables that cannot be shared. As a result, the information that is published must remain at the national summary level.” Thank you, Data.gc.ca

The file is HUGE. It will be good if you separate it by year or API that makes it searchable by specific value as most people would like to view data with regard to specific type :car type, certain age, certain year...etc

Would be really cool if you could add two fields: longitude/latitude of vehicle involved in collisions. You could make some really interesting maps with these data. Make it easier to identify collision hot spots. Could also link this to time component to see how collision location trends change over time.

[The following comment is posted on behalf of Transport Canada.] A good comment. Unfortunately, in our current database, the longitude and latitude are not available. It is something we hope the jurisdictions (who supply the collision data) will eventually be able to send us in the future.