Teaching Open Data
Mike Smit is a professor in the School of Information Management at Dalhousie University in Halifax, Canada. His research and teaching explore data management for open and big data, data literacy, the effect of open information on civic engagement, and the interaction of information and emerging technology (including cloud computing and the Internet of Things).
One of the great things about open data is that it's not usually released with a specific purpose in mind. We can't predict what uses people will find for raw data; the power of open data is in unanticipated uses which move beyond the interests, scope, or capabilities of governments.
I use open data in my teaching. As a professor in Dalhousie's School of Information Management, I teach courses to students in our Master of Library and Information Studies (MLIS) and mid-career Master of Information Management degrees. Working with data is an important part of both degrees, and the effective visualization of data is one key learning outcome.
I've been asking students to visit open.canada.ca, find a dataset, and use an effective data visualization to tell me something interesting. What does "something interesting" mean? Fortunately for students, I find lots of things interesting. What I want is to learn something I didn't know, and that isn't obvious just by looking at the data.
Every semester, I have been astounded at the creativity of students who scour Canada's Open Data portal in search of a dataset that captures their interest. For students from Canada, it's an opportunity to better understand their home country; for students from outside the country, they learn a bit more about their host country.
Thinking more broadly about the objectives of the assignment, it is worth reflecting on what we expect a modern workforce, and a modern pool of graduates, to know about working with data. The increasing interest in open data, combined with the problem of big data and the power of data science and data analytics, suggests the world is growing more and more data rich. But raw data is of limited use; we unlock the potential of data when we can analyze it, visualize it, create information and knowledge from it, and ultimately inform evidence-based decision making.
I'm part of a team of researchers at Dalhousie that was recently awarded Social Science and Humanities Research Council (SSHRC) funding for an initial look at the question "How can post-secondary institutions in Canada best equip graduates with the knowledge, understanding, and skills required for the data-rich knowledge economy?" What levels of what we call "data literacy" will we want Canadians to have as we consider the future of open data and open government?
These are big questions; for now, I'll say that an open data visualization assignment is a good start. I've linked to a copy of my assignment which anyone is welcome to use and adapt. Below, I've embedded some of the data visualizations MLIS students produced so you can see how students rose to the challenge of distilling complex datasets into easily absorbed messages.
Emily Colford (MLIS 2015) used data from the Canadian Ice Thickness Program, which measured sea ice thickness from 1947 to 2002. Because she used lighter-colored lines for more recent data, you can clearly see the decline in thickness over time (though each individual line is difficult to identify, the focus is the overall trend).
Carlisle Kent (Master of Resource and Environmental Management/MLIS 2016) used CMHC data to compare mortgage rates with average rent in the city of Halifax over the past 25 years; this showcases the value of the opportunity to buy property rather than renting it, and how this value has changed over time.
Harrison Enman showed us that the digital divide (a separation between people with regular Internet access and those without) is at its greatest among senior citizens whose family incomes are in the lowest quartile (the bottom 25%).
N.B. Ordinarily one would not connect these categories with a line graph, but the visual effect that results excuses this faux pas.
Keriann Dowling (MLIS 2014) pointed out tongue-in-cheek that as a percentage, far more men in prison are single than in the general population.
Finally, in another light-hearted analysis, Andrea Kampen (MLIS 2015) wondered if there might be a relationship between the amount of money Canadians spend on alcohol and unemployment levels; at a glance, it certainly appears that when more Canadians have jobs, we spend more money on alcohol. I will leave the reader to form their own conclusions for this one!
Mike Smit - February 08, 2016
Naomi, thanks for your comments - the short response is you do not need to be concerned. A blog post is a very small window into a large curriculum, and while I chose to focus on the "have fun with data" portion of one assignment, being critical consumers and users of data is a core part of my course and the broader curriculum. For example, we talk about spurious correlations, and how data exploration can show correlations and aid in the development of theories, but cannot account for various moderating/confounding variables alone, and many other aspects of being critical thinkers. We talk about how the data we have is the tip of an iceberg, and all the ways in which data can deviate from reality (just like 5 images and a blog post can give the wrong idea about the depth of a curriculum!).
Even in the context of data visualization, we talk about different audiences: are you trying to make a point, or inform generally? Are you exploring, or communicating?
I am glad you are interested and concerned, though! You may be interested in reading our report on data literacy, http://hdl.handle.net/10222/64578. I would welcome your input.
In case anyone else is concerned: this post is not intended to suggest any kind of conclusions about the data. This is about people having fun playing with open data, learning something basic from that data, and more importantly learning about working with and manipulating data in numeric and visual form.
Naomi Bloch - February 01, 2016
I commend and support this direction in LIS programs. We should be teaching students how to act as responsible data interpreters -- not just for "innovation" purposes, but so they can help their communities begin to use such information to hold parties to account and to address community needs.
I would prefer it if we could do this responsibly, by ensuring that basic numeracy lessons, and a fundamental understanding of statistics and research methods are not divorced from the process. Such understanding is necessary so that visualisations are not just "fun" but also reasonable interpretations of reality.
Based on the examples in this post, I have some concerns. Most of the above examples appear to be "lying with statistics" -- implying ready associations between semi-arbitrary variables, overlooking confounding variables or methodological data constraints, graphically representing different indices/scales as though they're comparable units of measurement, representing categorical variables as though they're continuous variables, etc. Not to mention a lack of labeling and some missing source citations. Essentially, the students' work (as presented) appears to be a showcase of everything we're afraid of when it comes to sharing data.
Maybe this can be addressed by discussing all these issues after the students' "first pass" and then having them re-visit the assignment. But the aim as a whole requires more than a data visualisation course if we hope to produce competent data stewards, facilitators, and users. I hope that we are moving in that direction.