Top 200

University of Chicago

  • Health

  • United States

  • Economically disadvantaged people

  • School (Public charity)

  • $3,244,736,644 (2015)

  • 25,036

Executive Summary

Cancer is driven by genomic mutations. By understanding these mutations, we can develop new drugs, refine diagnostic methods, and improve cancer treatments by understanding what combination of drugs work best for which tumors. The challenge is that the data needed is locked in data silos located at medical research centers. Researchers and physicians don't have access to the critical mass of data needed. This project uses a technology called a data commons to safely and securely share large genomic datasets. We will build a commons called the Contribute & Change (C2) Cancer Commons that spans seven major medical research centers. We will also build a hub called the Cancer Commons Hub linking our commons with four others to create one of the largest cancer data sharing systems in the world - this will change the way that we share cancer data and improve the outcomes of patients with cancer everywhere.

Watch the video

The Problem

Each year, over 595,000 cancer patients will die. Cancer is a group of related diseases in which the mechanisms that limit cell growth are damaged, and cells begin to grow, divide and spread. This uncontrolled growth is the result of genomic mutations. By understanding these mutations, many of which are rare, we can improve the diagnosis of cancer, the treatments that patients receive, and over time, patient outcomes. There are several challenges: Challenge 1. The genomic and associated clinical and outcome data are quite large, and most medical research centers struggle with the computing infrastructure required to manage and analyze the data. Challenge 2. Data from a large number of patients are required, since we must understand rare mutations and their combinations, and which drugs and drug combinations are most effective. This requires that data be shared, since no single medical research center has enough data itself. Challenge 3. We don't have a wide spread culture of data sharing in cancer today. Most medical research centers do not share their data. When they do, release of data is slow and the raw genomic data required by best practices is not shared. Negative results are rarely shared. Challenge 4. There is a competitive aspect to the business of cancer. There are several companies spending $100 million or more to buy up cancer related data, pool it, and sell it to biopharmaceuticals, which can take valuable data out of the cancer sharing system.

Proposed Solution

Our proposed solution is to: 1) help medical research centers extract data from their IT systems and to contribute the data to an emerging technology platform for data sharing called a data commons; 2) to create a hub so that multiple data commons can themselves share cancer related data in a secure and compliant fashion; and 3) to allow patients to contribute their data to the commons. The name of the commons we will be building is the Contribute & Change Cancer Commons or C2 Cancer Commons. We will be launching the C2 Cancer Commons with 7 founding partners named in the Team section. The C2 Cancer Commons will be a contributor to the Cancer Commons Hub. We will build the Cancer Commons Hub by integrating four additional commons, including the NCI Genomic Data Commons (the largest genomic data commons), the MED C Biomedical Data Commons (the first data commons serving the payer provider community), and AACR's GENIE network, and the Cancer Collaboratory (which holds international cancer data). Later we will add additional data commons. This approach will create one of the world's largest collection of cancer genomics and associated clinical data. It will accelerate the pace of basic research and make available to clinical oncologists the data required to select the best combination of drugs to target an individual patient's tumor. With the C2 Cancer Commons and Cancer Commons Hub, we will change the way that we share cancer data and improve the outcomes of patients with cancer.

Evidence of Effectiveness

The most direct evidence of a Commons' ability to accelerate discovery and improve cancer outcomes comes from The Cancer Genome Atlas (TCGA), a NIH project (2007 2017) to characterize the genomics of 12,000 cancer patients and to make this data available as a community resource. The analysis of TCGA data has identified subpopulations of patients with target able variants in several cancer types, resulting in new active clinical trials in breast, ovarian, endometrial, brain, and other cancers. TCGA data has also proved that machine learning algorithms do a better job than pathologists in diagnosing lung cancer. TCGA data has also led to better prognostic tests for brain and breast cancer. Some of the limitations of the TCGA include that the data was not fully integrated (there were two data repositories), nor fully harmonized (different TCGA projects analyzed the data using different methods), nor was data from electronic medical records, such as the response to therapy included. The first two issues have been addressed by the NCI Genomic Data Commons (GDC) for NIH funded research projects and the GDC will be integrated into this project the Hub. On the other hand, there is not a GDC style commons for data distributed across medical record systems at different medical research centers, which is one of the goals of this project. The scientific and policy consensus is that the integration and sharing of large biomedical datasets via data commons will result in healthcare advances, a concept that is endorsed by the Cancer Moonshot initiative.

Previous Performance

The OCC has worked collaboratively with partners to build cloud based and commons based infrastructure to serve the research community and to increase the sharing of scientific data since 2008. UChicago is one of the top universities in the world, and its scientists have made fundamental discoveries in cancer, including identifying the genetic basis of cancer and introducing the appropriate statistical framework for computing survival curves. The seven medical research centers contributing data to the C2 Cancer Commons include some of the leading cancer researchers in the world. The other commons interoperating with the Cancer Commons Hub include two of the largest cancer commons in the world. Amazon Web Services and the Google Cloud Platform are two of the most scalable cloud computing platforms in the world. In 2010, the OCC, with funding from the Moore Foundation, launched the Open Science Data Cloud, the first petabyte scale science cloud. It has been used by over 860 projects from 55 organizations in 14 countries. In 2014, UChicago, with OCC support, built the Bionimbus Protected Data Cloud, the first biomedical cloud designated as a NIH Trusted Partner, allowing it to compute over NIH managed human genomic data. In 2016, UChicago, with OCC support, built the NCI Genomic Data Commons (GDC), which was launched on June 6, 2016 by Vice President Biden. This is one of the world's largest data commons. The proposed Commons and Hub will be based upon the open source software stack developed for Bionimbus and the GDC.

The Team

Team Purpose

We believe that data sharing can improve cancer outcomes. Two intertwined and tractable problems in cancer research today are: 1) the culture in which each institution keeps their cancer data in a silo and only slowly and incompletely releases it; and 2) that these silos individually represent too small a number of patients to gain meaningful insight into the genomic diversity that drives cancer. Our purpose in coming together in this collaboration is to tackle both facets of this problem. The partners agree that data should be released quickly and that, together, we represent a sufficient number of institutions capable of generating data sets large enough (i.e. representative of enough cancer patients) to gain insight into the molecular drivers of cancer. We further agree that the Commons created here will provide evidence of the clinical utility of precision medicine and pave the way to payor coverage of genomic testing.

Team Structure

The Team will be led by Robert Grossman of the University of Chicago's (UChicago) Center for Data Intensive Science (CDIS). UChicago will have administrative, financial and reporting responsibility for the project. The not for profit Open Commons Consortium will be the lead organization for the governance and operation of the C2 Cancer Commons and Cancer Commons Hub. Key decisions about the operations, management, data access policies, and governance of the commons and hub will be made by an Executive Committee, with recommendations made by a Working Group, consisting of the various organizations working on the project. Each of the data contributors to the Commons and data alliance members of the Hub will join the Open Commons Consortium, if they are not already a member. The initial academic medical centers contributing data to the Commons are: The Children's Hospital of Philadelphia (CHOP) (PA), Johns Hopkins University (MD), Indiana University (IA), Mount Sinai's Icahn School of Medicine (NY), Northwell Health (NY), NorthShore University Health System (IL), and University of Chicago (IL). The initial members of the Hub are: the Molecular Evidence Development Consortium (MED C), the American Association for Cancer Research's (AACR) Project GENIE, OICR's Cancer Collaboratory and the NCI's Genomics Data Commons (GDC).

Past Funders

  1. Virginia and D.K. Ludwig Fund for Cancer Research
  2. John Templeton Foundation
  3. Pritzker Foundation
  4. The Andrew W. Mellon Foundation
  5. Bill & Melinda Gates Foundation
  6. Vanguard Charitable Endowment Program
  7. The John D. and Catherine T. MacArthur Foundation
  8. The Neubauer Family Foundation
  9. Matthew and Carolyn Bucksbaum Family Foundation
  10. Irving Harris Foundation

More Like This

Index Terms