AWS and C3.ai Are Leading the Efforts for Public COVID-19 Data Lakes to Analyze, Predict and Enable Data Exchange

Subscribe To Download This Insight

2Q 2020 | IN-5832

COVID-19 is severely disrupting the world economy. Businesses are impacted by worker health, but also by the panic, misinformation, and lack of actual tangible data to assess. Internet of Things (IoT) technologies and services are playing a significant role in responding to the uncertainty by enabling not only the collection of data but also its analysis so businesses can both navigate and rebound from this crisis and become more resilient in the next crisis. Big data holds the key to combating the pandemic. For instance, South Korea has collected much data on the density of infections within the country; however, to maximize its value, it has taken steps to make it freely available to developers. The government’s goal is to encourage developers to create smart inventory systems to enable better access to masks, medications, and other medical supplies in the locations where they are most needed. ABI Research forecasts that revenues for IoT data and analytics services will reach US$37 billion by 2021.

Registered users can unlock up to five pieces of premium content each month.

Log in or register to unlock this Insight.

 

Big Data & COVID-19 Pandemic

NEWS


COVID-19 is severely disrupting the world economy. Businesses are impacted by worker health, but also by the panic, misinformation, and lack of actual tangible data to assess. Internet of Things (IoT) technologies and services are playing a significant role in responding to the uncertainty by enabling not only the collection of data but also its analysis so businesses can both navigate and rebound from this crisis and become more resilient in the next crisis. Big data holds the key to combating the pandemic. For instance, South Korea has collected much data on the density of infections within the country; however, to maximize its value, it has taken steps to make it freely available to developers. The government’s goal is to encourage developers to create smart inventory systems to enable better access to masks, medications, and other medical supplies in the locations where they are most needed. ABI Research forecasts that revenues for IoT data and analytics services will reach US$37 billion by 2021.

The COVID-19 pandemic has gone through several stages regarding data and interpretation of the statistical indicators:

  • Initially, medical data, testing, connected medical devices, or simply data on traveling was not available, which created major panic and disinformation regarding the scope of the pandemic.
  • Then, the data availability exploded because industries made it their mission to obtain, collect, and understand data to make informed decisions. And finally, despite the availability of data, there is no clarity in regard to what most of it means. The data has been questioned over its quality, sources, and underlying structural issues. It became critical to understand and assess the contextual and environmental state of the data obtained, the frequency of the updates, and metadata assembly for further manipulation and Machine Learning (ML) in the IoT domain, specifically. However, there are other emerging types and strategies regarding data collection, such as contract tracing data using Bluetooth by Google and Apple.
  • Additionally, there is an emergence of data silos, which reinforces the overall (not just IoT) struggle and challenge with sufficient data management and the existing barriers to overcome it.

COVID-19 Data Lakes

IMPACT


As the COVID-19 pandemic continues to disrupt business-as-usual across the globe, the large cloud vendors are taking the opportunity to capitalize on and reinforce its relevance through the big data. As of now, some cloud vendors, such as Amazon Web Services (AWS) and C3.ai, have already stepped into creating a publicly accessible data lake for analyzing the novel coronavirus:

  • AWS COVID-19 Data Lake is a centralized repository with the most recent information and data on or related to the spread and characteristics of COVID-19. Hosted on AWS cloud, the data lake is tracing and adding datasets from John’s Hopkins and The New York Times and over 45,000 research articles about the virus from the Allen Institute for Artificial Intelligence (AI). The data can be publicly accessed through the Amazon S3, where the data can be exposed to the AWS Glue Data Catalog with the application of analytics engines and serverless SQL query engine—Amazon Athena.
  • C3.ai COVID-10 Data Lake is a centralized repository of the COVID-19 data, integrated from multiple sources in unified data model “analytics ready” with the availability of RESTful Application Programming Interfaces (APIs) for data manipulation and insight extraction. Utilizing C3.ai’s proprietary AI engine, the data stored in the data lake is highly interactive through the application of ML and AI, whereas leading universities, the United Nations, and the Center for Disease Control and Prevention (CDC) are in active discussion with C3’s data utilization efforts. The C3.ai COVID-19 Data Lake will serve to accelerate the application of Enterprise AI to alleviate COVID-19. For example, C3 Enterprise AI can enable medical breakthroughs by enabling genome-specific COVID-19 medical protocols, modelling, simulation, and prediction of efficacy of COVID-19, as well as logistics planning, and optimization, public health efforts, tools, methodologies, and best practices from around the world in one single uniformed system.

Therefore, public health data is taking a small step toward standardization and large-scale data exchange, with a hope to create a centralized data catalog that stores each source for the creation of metadata of various data sets.

Technology Application

RECOMMENDATIONS


Essentially, there is a need to democratize access to data that will likely drive innovation into new business models such as Data-as-a-Service (DaaS) and Analytics-as-a-Service (AaaS). COVID-19 demonstrates that data is indeed the new gold, but it needs to be accessible, particularly in times of crisis. With C3.ai and AWS already capitalizing on the data market by offering the centralized repositories (data lakes), data exchange is being enabled between different parties to predict COVID epidemiology and its further impact. Apart from the data lakes, other technologies are also expected to prevail with a focus on becoming even more valuable in the government, healthcare, and transportation sectors. For example, Nokia is stepping up with its SpaceTime scene Analytics, which has a deep learning capability, the model training is taking place in the cloud and can be deployed at the edge. The video analytics technology is self-learning capabilities, unlike the traditional segmentation of images approach. Hence, it is expected that such solutions will lead to higher interest in IoT across all sectors to harden business operations for another crisis similar to COVID-19.

Additionally, there are a number of the data-enabled techniques that can be widely utilized by the technology companies and enterprises to mitigate a pandemic like COVID-19 in the future:

  • ML and AI application to calculate and predict the spread of the viral diseases
  • Data sharing of the clinical trials, processed through the AI engine to identify the methods, medications, and levels of intervention proven to be successful in the past
  • Data analytics research with data mining and data integration from various IoT edge computing/gateway devices, i.e., beds availability.

For a clearer picture of the current and future ramifications of COVID-19 across technologies and industries, download our whitepaper Taking Stock of COVID-19: The Short- and Long-Term Ramifications on Technology and End Markets.

 

Services

Companies Mentioned