Social media for tracking disease outbreaks – fad or way of the future?
- Written by C Raina MacIntyre, Professor of Infectious Diseases Epidemiology, Head of the School of Public Health and Community Medicine, UNSW Australia
Social media has revolutionised how we communicate. In this series, we look at how it has changed the media, politics, health, education and the law.
Infectious diseases kill more than 17 million people every year. Large outbreaks, known as epidemics, are becoming more frequent. And more serious infections have emerged in the past decade than any time previously.
The social and economic impacts of epidemics can be severe. SARS (severe acute respiratory syndrome), for example, cost the global economy US$54 billion.
There is also a growing risk of unnatural epidemics from bioterrorism as a result of quantum advances in gene editing.
We need better surveillance systems to detect epidemics early. But while there is the potential to predict epidemics by mining data of rumours and news reports (rumour surveillance), or clusters of disease symptoms (syndromic surveillance) described by social media users, we’re not quite there yet.
Traditional disease tracking systems
Traditional disease surveillance relies on data obtained from doctors, hospitals or laboratories through formal reporting systems. This yields valid and accurate data about emerging outbreaks and the impact of control strategies such as vaccinations. But it’s often not timely.
Epidemics can rapidly spiral out of control. Take the 2014 outbreak of Ebola, for example. There was an exponential rise of cases between July and October, with cases doubling each day. Ten cases one day became 20 the next, 40 the day after and 640 in one week.
The earlier epidemics are detected, the easier they are to control. Detecting and acting on the Ebola epidemic early, when there were only ten cases a day, could have prevented more than 600 cases a week later.
Rapid detection using social media
Digital data are now publicly available from many sources. People talk about epidemics on social media using key words such as “fever” and “infection” before they are officially identified.
A surveillance system for detecting outbreaks of Ebola using Twitter, for example, could set geospatial tags for specific locations such as the African continent. It could search for a cluster of terms on the Twittersphere such as “haemorrhage”, “fever”, “virus”, “Ebola”, “Lassa” (an illness that can be confused with Ebola).
A system trying to identify influenza could mine terms that reflect visits to the doctor, purchase of tissues, paracetamol or aspirin from pharmacies, sick leave from work, as well as terms specific to the clinical syndrome of influenza.
But while there have been some attempts to use social media for disease surveillance in the past, such as Epi-Spider (an outbreak tracker in Atlanta, Georgia), none are currently operating.
Social media has, however, been successfully mined for other health applications. The CSIRO, for instance, developed a tool called WeFeel to measure the emotional pulse of countries using data from Twitter.
Using news media
Several publicly available web-based applications collect event-related information from news articles (but not social media), such as HealthMap and MedISys. Data is automatically collected and processed, and is sometimes moderated by a human before potential health threats are identified and published.
HealthMap was able to provide an alert for a “mystery haemorrhagic fever”, which became the 2014 West African Ebola outbreak, nine days before the World Health Organisation (WHO) announced the outbreak.
The WHO estimates that 60% of its initial outbreak alerts are from informal sources such as the Global Public Health Intelligence Network (GPHIN), a news aggregator developed by the WHO with Canadian Public Health.
Google Flu Trends ran from 2008 to mine data from Google searches to predict influenza epidemics. But analysis of the value of this approach has been mixed and Google ended the initiative in 2015.
Moderated expert sites
Expert sites that report unofficial information from health experts are also a valuable source of epidemic alerts. Flutrackers and ProMED-mail are moderated sites known for timely and high quality outbreak information.
Many important epidemics have first surfaced on ProMED-mail, such as the Middle Eastern Respiratory Syndrome (MERS) Coronavirus and Ebola. ProMED-mail has now teamed up with TEPHINET (Training Programs in Epidemiology and Public Health Interventions Network), HealthMap and the Skoll Global Threats Fund to create a new rapid epidemic detection system, EpiCore.
Epicore is a closed virtual network of health professionals around the world who provide feedback on rumours and news stories to enhance epidemic surveillance.
Expert blogs are also a source of information, but can vary in reliability and quality.
Trade-off between accuracy and timeliness
Ideally, we want disease surveillance systems to obtain timely and valid data, but this is seldom feasible.
Traditional surveillance systems are subject to a number of checks to ensure the accuracy of their data. While this maximises validity, it results in delay and limited practical use.
For rapid detection of epidemics, a trade off is required between speed and data validity.
Social media-based surveillance isn’t a replacement for traditional surveillance, but an enhancement to it that improves our capacity to detect outbreaks early.
Once a rapid signal is acquired, public health authorities can then investigate and confirm the epidemic, and traditional surveillance can take over.
How can we better use social media?
Social media presents an opportunity to enhance epidemic detection and control. But unofficial information is unstructured and not created for public health purposes.
Algorithms designed to pick up “fever”, for instance, may detect false positives such as “Bieber fever”. So we need well-constructed algorithms for data mining.
The vast quantity of data available requires super-computing power, and methods to filter out background “noise” reliably.
Methods such as time series analysis can be used to compare several years of data to test if an epidemic signal is higher than expected compared to previous years. We already use these methods to improve traditional surveillance data, so they can be applied to social media data.
Machine learning holds promise for the future, but we need thoughtful human analysis and expert interpretation of the data.
In the meantime, a more active approach could involve user engagement and participation in surveillance activities, where citizens can send reports or surveys directly to public health authorities via mobile applications or websites.
This article was co-authored by Sheng-Lun (Jason) Yan, a UNSW medical student who is currently researching a project on social media for epidemic intelligence.
Authors: C Raina MacIntyre, Professor of Infectious Diseases Epidemiology, Head of the School of Public Health and Community Medicine, UNSW Australia