Back to the Future - From Delphi Epidata to COVIDcast, and Back

#epidata #COVIDcast
Back to the Future - From Delphi Epidata to COVIDcast, and Back

Roni Rosenfeld, Peter Jhon, Carlyn Van Dyke

Outline

    For those of you who first heard of the Delphi group during the pandemic, we may well be synonymous with COVIDcast, our repository of real-time, geographically detailed COVID-19 related signals/indicators.

    But there is much more to Delphi than COVIDcast. We were founded in 2012 with the mission to develop the theory and practice of epidemic forecasting. For the first eight years of our existence, we focused mostly on forecasting flu and dengue. In the early years, we found ourselves spending a lot of time and effort on identifying and reconstructing historical training data that is properly versioned, namely, recording which version of the data was reported on which date (“what was known when?”). This is critical for training statistical machine learning forecasting models that are to be used in real-time. To save fellow researchers from having to duplicate this effort, in 2016 PhD student David Farrow created a database and Application Programming Interface (API) to store and publicly serve properly versioned epidemic surveillance data streams – thus Delphi Epidata was born.

    From 2016 to early 2020, Delphi Epidata grew steadily with myriad signals related to; flu, dengue, and norovirus, mostly covering the U.S. but also some from Taiwan, South Korea, and South America. This was driven by our research interests or participation in community forecasting challenges.

    When the pandemic broke out, we naturally turned our focus to COVID-19. Our collection of signals grew dramatically, from a few dozen to several hundred, with all new signals focusing on COVID-19 in the US. We also created a new Delphi Epidata ingestion pipeline, database schema, API endpoint, and visualization website, all focusing on COVID-19 signals, and gave it the name COVIDcast. As a result, COVIDcast is the portion of Delphi Epidata that focuses on COVID-19 related signals. It is currently the biggest portion of Delphi Epidata, and constitutes what we believe may be the largest public repository of real-time, geographically-detailed indicators of COVID-19 activity in the U.S.

    As the pandemic’s critical phase began to subside, we have been gradually returning to our long-term mission: to develop the theory and practice of epidemic forecasting for all existing and emerging pathogens and other fast moving public health concerns. At the signals level, this means zooming out from just COVIDcast to the bigger picture, returning the focus to Delphi Epidata. We have recently reconfigured our website accordingly. The move from COVIDcast to Delphi Epidata is not a rename, but rather a post-pandemic return to our original broader concept.

    ——

    Delphi Epidata was purpose-designed for hosting signals for epidemic and pandemic detection, tracking and forecasting. It has built-in support for data versioning, calendar reporting effects, anomaly and trend detection, backfill projection, privacy-based censoring, and geographic, temporal and demographic breakdown and aggregation. The Delphi Epidata repository contains over 500 different current or historical signals, tracking flu, COVID-19, dengue, norovirus, and other pathogens, and covering all rungs of the severity pyramid.

    We procure data streams that reflect epidemic and pandemic activity from a wide variety of sources – including unique industry partnerships and scraping of publicly available data – and extract from them in real-time, disease-related signals at the finest possible geographic, demographic, and temporal granularity.

    We make all our signals freely available in real-time to the greatest extent allowable, using a public API which is updated with new data daily. We also provide:

    Delphi is committed to its mission of advancing the theory and practice of epidemic detection, tracking and forecasting. We welcome your comments, questions, and suggestions as we continue to grow.


    Roni Rosenfeld is a Principal Investigator in the Delphi group and a Professor and Head of the Machine Learning Department at CMU. He is also a Google Fellow.
    Peter Jhon
    Carlyn Van Dyke
    © 2025 Delphi group authors. Text and figures released under CC BY 4.0 ; code under the MIT license.

    Latest Stories