COVID-19 has unleashed a global tsunami of charts, graphs, maps as well as predictive data models which visualise a wealth of data. The American novelist Mark Twain, who was fascinated with science and scientific inquiry, coined the phrase “lies, damned lies and statistics”. If he were alive today, he might have added poor data visualisations. In this blog we are ask what are some of the pitfalls of representing data visually and how they can be overcome?
Lets start with the visualisations most people have come across in the last month, Coronavirus dashboards. The most reliable one is generally acknowledged to be the online Johns Hopkins Coronavirus Resource Centre, which precisely compares data from around the world in near real time, and is regularly cited by governments and media. In fact many other Coronavirus dashboards take their data from the Johns Hopkins one.
Another notable example is Euromomo which aims to detect and measure excess deaths related to seasonal influenza, pandemics and other public health threats. Their graphs and maps are clearly labelled which increases confidence and credibility in the data. Like Johns Hopkins, they explain their methodology and openly communicate caveats. There have also been a number of other sites popping up, including one developed by Avi Schiffmann a high schooler pupil from Washington State. It just goes to show that inexpensive data visualisation tools enable anyone to dabble in data science. Consequently, there is now a campaign for consistent standards in data visualisation is being championed by the likes of Harvard; who have launched a project to capture and share best practices.
Presentation of the core data such as number of confirmed cases, those who have recovered and deaths can be misleading when scales are incomplete. The BBC was criticised for showing a bar chart with death rates for over 80’s shown as 15% with a scale that ended at 15% ; thus visually painting an alarming picture; even though 85% survived. Extending the scale to the full 100% would have enabled the reader to discern a more balanced visual representation of the data.
Another key challenge is the disparity in how different countries (as well as states/regions etc) define, capture and report key data on the number of confirmed cases, numbers actually tested, patients requiring ICU treatment, recovered and those who sadly die. Even cause of death becomes contentious when factoring in those patients with ‘underlying health issues’ i.e. have they died as a direct result of COVID-19 or ‘with’ the virus. Such variations see Belgium having a mortality rate of 56.8 per 100,00; the highest in Europe. However, they attribute all deaths in nursing homes to COVID-19 so could be over reporting, whereas, the UK, until recently, mostly reported deaths which occured in hospitals. So a basic bar chart representation can actually mask all kinds of anomalies and inconsistencies in the underlying data. In fact, many Government have now chosen to exclude data from China in its visualisations as it lacks credibility and transparency.
The absence of rigorous testing (notable exceptions being Germany, Singapore and South Korea) data has made calculating the overall mortality rate something of a lottery, with some media outlets crudely and inaccurately taking the number of deaths as a percentage of confirmed cases. Even the WHO were stating back in March that the mortality rate would be 3.4%. Typically however, it’s now estimated that 80% of people will eventually be either asymptomatic or suffer only mild flu like symptoms. We may only ever know post the pandemic when there is a greater amount of testing data available to model. One early study has suggested that the mortality rate might be as low as 0.66% compared to a rate of 0.1% for a typical winter flu outbreak, and according to the WHO 50% plus for Ebola. Data taken out of context can lead to alarming and sensational claims such as the survival rate for COVID-19 being 50%.
Returning to the theme of testing and big data, we know that countries such as Israel, Australia, and Singapore are introducing mobile phone apps for contact tracing. This raises all kinds of ethical and privacy questions around the use and storage of such data by governments. As we have seen with China’s social credit system such data can be used in coercive ways to monitor and control citizens. Used in the right ways however, visual data can be a catalyst for changing behaviours as we ease lockdown measures in the fight against the pandemic. Combining tracing data with existing city Big Data e.g. as part of Polivisu's Czech visualisation (created by Innoconnect) showing Covid-19 cases by geography and demographic. Such an approach could help to improve collaboration and engagement between citizens and policy makers so that better decisions are made in real time as we test out and ease the lockdown restrictions through 2020.