Note: This is a technical examination of available COVID-19 open data. For general information please visit your relevant local website, including:
Summary
Sometimes when advocating for a particular change, it can be hard to step back and recognize the progress that has been made.
COVID-19 open data in Ontario is available and is provided in a way that meets many of the goals that open data advocates had set out. The City of Ottawa and the Province of Ontario both prominently link directly to COVID-19 open data on their dashboard pages. The Government of Canada also links to open data, but not as prominently or consistently. There is also an academic site providing data and code for Ottawa. All sites provided clear licensing.
These data visualisations and open data can be considered a kind of infrastructure. In the context of academic use of open data, particularly with the addition of open source computer code, these sites can even be considered part of a digital research infrastructure.
Goals for Open Data
There were several elements that were envisioned for open data in general:
- available
- open license
- usable - clear descriptions
- usable - machine readable
- visible, in the sense that clear links to underlying data would be provided e.g. on websites
- discoverable through search
Basically, that anyone could find it and use it. That it would provide a usable machine-readable alternative to scraping HTML web tables or hand-extracting data from PDF or Word documents. And also that it would be provided from every organisation at every level of government.
For sustainability, it was also hoped that open data would be part of each organisation's own internal processes:
- used by the service providers themselves
There is an additional key element I have to say wasn't strongly considered by many of the initial advocates, which is that there should be tools provided to view and work with the data without having to go in and do your own spreadsheets or write your own code:
- tools are provided for anyone to work with the data
And I would add lastly, as the community thought through the provision of tools (e.g. website visualisations):
- are the tools themselves made available for open reuse, e.g. can you access the underlying code and pipelines that are used to make the website views
These goals sometimes seemed very far off. But with COVID-19 data in Ontario, I would consider it a success.
COVID-19 open data is available at the municipal level (e.g. City of Ottawa), at the provincial level, and federally. (And beyond, through international aggregation sites.)
City of Ottawa
Ottawa COVID-19 Open Data
Ottawa's open data site is https://open.ottawa.ca/
COVID-19 open data can be found just using "COVID-19" as a search. In this case I've also filtered for just CSV format files.
https://open.ottawa.ca/search?q=covid-19&type=csv
There are 8 datasets, including:
- COVID-19 Cases and Deaths in Ottawa
- COVID-19 Weekly Cases and Rates by Age in Ottawa (Last 6 Weeks)
- COVID-19 Reproduction Number (R(t))
- Data tables for Public COVID-19 Maps
The only one that doesn't seem to be updated frequently is "Data tables for ONS Neighbourhood COVID-19 Maps".
So the good news is the data exist, the data are being kept up-to-date, and the data are in a usable, machine-readable format.
So let's look at three more aspects:
- Is the data surfaced with tools for people to work with the data?
- Is the data clearly linked when it is used?
- Do the city's own websites/tools use the open data (and is it clear that they do so)?
Ottawa COVID-19 website
The main general use site is Ottawa Public Health - Statistics on Coronavirus 19 (COVID-19) in Ottawa, it has links to:
- Daily COVID-19 Dashboard
- COVID-19 Vaccination Dashboard
- Mapping Products
- Wastewater COVID-19 Surveillance
- Projections
The dashboard is in Microsoft Power BI, which I have seen used on a number of COVID-19 information sites.
The data pipeline is clear, but I don't think it is one that is available to the public: "Data are extracted from [the provincial Case and Contact Management (CCM) reporting system] at 3 pm daily and loaded to the dashboard the following day." In other words, it doesn't look like the Power BI visuals are generated directly from the Ottawa open data, but instead from the provincial closed system.
Open Data Clearly and Prominently Available
However, the really good news is right below the explanation of the data source used (CCM), and before showing the dashboard, the page links directly to the open data: "The following data tables are available for download on Open Ottawa".
So this meets a key goal, it makes the data a kind of equal, so that you easily click the links and do your own analysis whether in a spreadsheet, by writing code, or using other data analysis tools.
It's hard to overstate what a big change this is in how data is provided to the general public. Linking to the open data at all is a big deal, but linking to it right at the top before the visualisation is a really strong and clear statement of openness.
There is a gap in that as far as I know there's no way to access the underlying Power BI setup (which incidentally generates five different pages, use the < > way down at the bottom of the Power BI frame on the page to see other Power BI pages), which means that you can see the really complicated visuals but you don't get much insight into how they're made or how you could use Power BI to make your own.
There is also a gap in that, I guess due to different cut-off time for updates, the municipal dashboard numbers are different from the provincial ones.
In terms of discoverability there are basically two pathways to the open data, one directly from Open Ottawa and one from Ottawa Public Health, but they are not quite equal.
If you land directly on Open Ottawa, you would find out it has COVID-19 open data even without a search as they are surfacing several datasets (I would guess this is automated based on traffic to the datasets, but it could be manually curated).
But you wouldn't find wastewater data, because it's on a different site https://613covid.ca/wastewater/
The wastewater site uses more of an academic format, and presents all of the data visualisations up front, before you find out below that you can "See the Methods page for more information on how the samples were collected, access to the data, and how the plots were created."
There are some issues at this point that are more web design and user-centric design than data. For example the Methods page is actually called model, and you can't get directly to the top of the page by clicking the Methods menu item, you have to click "Projection models" within that menu. If you're looking for data you're more likely to pull down the Methods menu and go to "Plot data", but that won't take you directly to the open data - basically you will have to navigate around the page quite a bit before you find out that the data are actually linked under "Waterwater surveillance". The data are made available on GitHub.
https://github.com/Big-Life-Lab/covid-19-wastewater
So not only could the wastewater site benefit from more directly and clearly linking to the GitHub data near to where it is shown in visualisations and tables, but you can't find the wastewater data at all if you start on the Open Ottawa site. So there is a gap in integrating these different open data sources. However, the use of GitHub does provide another path for discoverability, and adds the ability to easily share not just data but also code.
The situation for data is similar for COVID-19 projections, which can be found at
The source data for the projection models is the Ottawa open data, which is a good clear reuse of the data directly. The projection models themselves are drawn from a variety of clearly linked sources. You can see that the wastewater data is kind of added on to the models, which may be one reason for some content visibility issues in the website design. By navigating on GitHub I can also find that the 613covid.ca projections website is generated from
https://github.com/Big-Life-Lab/Ottawa-COVID-Projection
but I don't think you can find this on the projection website itself. It's good that the website content is on GitHub as it means pull requests can be submitted with suggested changes; however creating and submitting pull requests for a site built from GitHub is a high barrier for most users.
Province of Ontario
Ontario COVID-19 Open Data
The Province of Ontario's open data site is https://data.ontario.ca/
COVID-19 open data can be found just using "COVID-19" as a search. In this case I've also filtered for just CSV format files.
https://data.ontario.ca/dataset?keywords_en=COVID-19&res_format=CSV
There are 19 dataset pages, which may link to more than one dataset per page. For example, to find daily new cases you first go to the page
Status of COVID-19 cases in Ontario
and then select the dataset
(PHU means Public Health Unit.)
Having a full page means that a lot of explanatory and related information can be provided. The data website itself has good functionality including the ability to preview a CSV dataset on the website itself as a grid (spreadsheet-like view), as a graph, or as a map.
This is a great way to allow people to explore data without needing their own tools.
There is also an API available for the datasets.
Ontario COVID-19 website
The main general use site is COVID-19 (coronavirus) in Ontario.
Right below a website display of COVID-19 cases, it has a prominent button:
and next to it,
Download the raw data from the Ontario Data Catalogue
So you have two highly visible pathways to the data. The data and details page provides tables, maps and charts, so you can work directly with the data on the web without needing specialised tools. At the bottom of the data and details page, it again links to the underlying data, in a "Where the numbers come from" section. You are left a bit to navigate the data catalogue to figure out which datasets generate which charts. The page would benefit from adding a CSV download button with each visualisation.
For example, I had to try several different data catalogue pages and examine a number of datasets before finding the new cases dataset.
Nevertheless, this is overall a really strong open data success, with multiple really clear linkages to the open data.
If you land directly on the Ontario Data Catalogue, you would find out it has COVID-19 open data even without a search as they are surfacing several datasets both under What's New (which I'm guessing is manually curated) and in the lists of Most popular and Recently updated datasets.
I will note however that their custom 2019 Novel Coronavirus group includes many more datasets than you would find just by searching COVID-19. There is a pathway from a dataset to the group, but you would have to notice that there is a Groups option at the top of every dataset.
Government of Canada
Government of Canada COVID-19 Open Data
The Government of Canada open data site is https://open.canada.ca/en/open-data
COVID-19 open data can be found just using "COVID-19" as a search. In this case, I've filtered to include CSV, JSON, GeoJSON, XLS, XLSX, XML only:
With these filters, there are 118 datasets.
If you look at e.g. Cumulative number of COVID-19 vaccine doses administered in Canada by jurisdiction
https://open.canada.ca/data/en/dataset/3b75a8d6-c5a9-48f9-834b-626eec16363f
you can see a clear description of the dataset and a link to download it in CSV format, but no tools to work with the dataset on the page itself.
Government of Canada COVID-19 website
The main general website is Government of Canada - Coronavirus disease (COVID-19).
A page about data can be found linked right in the top section
Current situation - COVID-19 data trends
Once we get there though, the situation is somewhat mixed.
The first link, Daily epidemiological report, takes you to many visualisations. But it is really PDF bound, including a link to a PDF version of the page at the top, as well as (somewhat indirectly) linking to a weekly PDF summary (week of April 9, 2021 in this example).
There are some links to underlying open data, e.g. to CSV open data underlying "Current situation", but you have to be pretty sharp-eyed to spot them.
That being said, on the data trends page, there are two links in trend data that will take you to data:
- Downloadable data on COVID-19 across Canada takes you to the Government of Canada open data website, but it takes you to a single dataset, Public Health Infobase - Data on COVID-19 in Canada rather than a group or search or summary page listing all of the 100+ COVID-19 datasets.
- You can also find CSVs linked from some of the visualisations in Visual data gallery: COVID-19 graphs, charts, maps and more.
Overall I would say that while kudos is due to the Government of Canada for linking to open data, it could take lessons from the City of Ottawa and the Province of Ontario in more prominently and consistently surfacing the open data as part of its websites.
Similarly, if you land directly on the Government of Canada open data website
https://open.canada.ca/en/open-data
you get no indication that it has COVID-19 open data, unlike the prominent highlighting of COVID-19 open data on both the Ottawa and Ontario websites.
International
I will mention very briefly Our World in Data, which aggregates and then republishes international open data. For COVID-19 specifically see
https://ourworldindata.org/coronavirus
There is a clear open license at the top of the web page, and there are visualisation tools (charts) provided.
Notably both the data and the code are made available on GitHub.
https://github.com/owid/covid-19-data/tree/master/public/data
Sidebar: Coding and Open Data
It is now much easier to work directly with open data in modern programming languages, particularly with CSV format data. As an example, you can see my previous blog post about Juypyter Notebooks (Python coding) using Google Colaboratory, and a Jupyter Notebook that I created to chart new cases of COVID-19 in Ottawa is available on GitHub: https://github.com/scilib/ontario_cv19_opendata_plot
License for this blog post
This blog post is Copyright © 2021 Richard Akerman, and the text and associated images are licensed in the Creative Commons BY-NC 4.0 International.
This blog post is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
This license applies only to this blog post and not to the entire blog as a whole.
Recent Comments