Google Uses Big Data to Provide Critical Information for Prompt Disease Control
Back in 2009, a new flu virus - H1N1 was discovered, and it spread quickly. In the United States, the Centers for Disease Control and Prevention (CDCP) requested doctors inform CDCP of new flu cases so that CDCP can take action to contain its spread. There were some problems on how fast information could be collated:
1. People might feel sick for a few days before seeing a doctor;
2. Reporting of new flu cases by doctors to CDCP was not immediate; and
3. CDCP only tabulated the information once a week.
The above factors made the critical information for prompt disease control out of date by at least 2 weeks, at a moment where real time information is required.
Google saw this as an opportunity to use its expertise in data analytic and access to vast amount of data to provide near real time information, by looking at what people were searching for on the Internet. Google took the 50 million most common search terms that Americans typed and compared the list with CDCP data on the spread of seasonal flu between 2003 and 2008, to look for correlations between the frequency of certain search queries and the spread of the flu over time and space. After processing a staggering 450 million different mathematical models, they found a strong correlation between their prediction and the official nationwide figures of CDCP in near real time, not 2 weeks after the event.
The above illustrates that Google's system proved to be a more useful and timely indicator than government statistics with their inherent reporting lags.
We are writing to share what we read about Big Data and related subjects with readers from around the world.