Canadian and U.S. leaders in cancer research announce a big data challenge to develop robust methodologies for predicting cancer mutations

November 8, 2013

Canadian and U.S. leaders in cancer research announce a big data challenge to develop robust methodologies for predicting cancer mutations

November 8, 2013 In: News Releases

An open Challenge that merges the efforts of the world’s largest cancer genome sequencing consortia, the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) with those of Sage Bionetworks and DREAM.

TORONTO, CANADA – Cancer research leaders from the Ontario Institute for Cancer Research (OICR) and the University of California, Santa Cruz (UCSC), in collaboration with Sage Bionetworks and IBM’s DREAM, will announce tomorrow the opening of the ICGC-TCGA-DREAM Somatic Mutation Calling (SMC) Challenge (!Challenges:DREAM) at the Sixth Annual RECOMB/ISCB conference (

Like previous DREAM Challenges in the series, this new Challenge will engage a diverse community of scientists to solve a specific problem in a given time period by placing scientific data, tools, scoreboards and the resulting predictive models into an open Commons.

The specific problem the SMC Challenge will address is the need for accurate methods to identify cancer-associated mutations from whole-genome sequencing data. Cancer is a disease of the genome, caused by disruptions in DNA that alter specific gene functions. Although today’s DNA sequencing instruments can amass great quantities of sequence data from a patient’s normal and tumor tissues, the ability to identify DNA mutations and rearrangements accurately on the basis of those data remains elusive; current studies agree in only about 20% of their predictions.

To address this need, the Challenge will post the raw DNA sequencing data of 10 human tumor-normal pairs (5 prostate, 5 pancreatic), comprising approximately 9 terabytes of data to a high-speed distribution server. Contestants will have 6 months to optimize their predictive models. After the Challenge closes in July, 2014, at least 5000 DNA candidate mutations predicted by different participating teams will be prospectively validated on an independent sequencing platform by the Challenge organizers. The accuracy of participants’ predictions will be ranked using the newly generated validation data based on sensitivity, specificity and balanced accuracy amongst other metrics.

To participate in the Challenge that opens tomorrow, individuals will need to register at!Challenges:DREAM. In addition, they must be approved by OICR’s ICGC Data Access Compliance Office to acess the data.

As Canadian OICR researcher and Challenge organizer Professor Paul Boutros puts it, “Governments around the world have committed hundreds of millions of dollars to sequence cancer genomes to find new drug targets and to develop treatments that are personalized to each person’s cancer genome. But realizing these goals is currently blocked by scientists’ inability to identify mutations in cancer genomes. It is really tremendous that ICGC and TCGA are coming together with Sage Bionetworks and DREAM to address this problem using a DREAM Challenge that will set a gold standard that groups around the world can use to understand the cancer genome!”

To help realize this Challenge, industrial partners have stepped up. Google is making their Google Cloud Platform available to OICR-approved participants, including free access to the contest data in Google Cloud Storage and discounted Google Compute Engine cycles. Cloud processing will open the door for a whole new set of participants who do not have access to large compute clusters at their own institutions. Hitachi has provided free storage to host the data on a 1PB disk donated for cancer genomics. Annai Systems ( is providing their Annai-GNOS™ data management platform to facilitate upload, hosting and access to the data in the Hitachi store. Annai’s GeneTorrent software will provide high-speed data transfer to the Challenge participants.

Challenge participants will use the Synapse infrastructure (, built by Sage Bionetworks, that allows collaboration by Challenge teams on an open platform. Synapse’s tools and forum will allow Challenge participants to: (1) record what processing and analysis they’ve done on the data; (2) submit their predictive models to a real-time leaderboard for scoring; and (3) share their ideas, model code and analysis results with others in the Challenge.

Publications based on the highest-ranking predictive methods from the Challenge will be considered with Nature Publishing Group. Nature Genetics editor Myles Axton will advise the Challenge on publication strategy and work with Synapse to understand the scientific quality control that can be obtained via competitive collaboration. “The exciting thing about this exercise from an editorial standpoint is that we can analyze just how much the strategies are improved during the contest and how much peer review is then needed to obtain a useful research publication at the end. The beauty of doing this on an open platform is to see the rigor, transparency and detail of each group’s approach and to be able to replay each strategy in a robust way,” Myles says. “This is a good use of editorial time since peer review improves the strategies, it improves the resulting publications and it improves the databases and journals by preparing us for the future of knowledge production. I really hope the winners combine elements of the best strategies into fuller publishable units, in that way they will get the best out of the challenge as well as our involvement with it.”

Explains UC Santa Cruz Professor Josh Stuart, “The timing of this Challenge couldn’t be better. ICGC and TCGA recently announced that they plan to jointly analyze a dataset of approximately 2,000 pairs of tumor-normal whole genomes as part of a 2014-2015 Pan-Cancer effort to elucidate comprehensively the genomic changes present in many forms of cancers. Thus, the winning algorithms selected by this DREAM Challenge will help ICGC/TCGA researchers provide the largest unified view of cancer genome variation to date.”

Cancer researcher Dr. Stephen Friend founded Sage Bionetworks out of a conviction that “…the best approach towards developing robust and accurate predictions such as those needed for mutation calling is to enable an open diverse community where data access is simple and people are incentivized to share. Sage and DREAM have already shown that in the span of several months, DREAM Challenges can attract hundreds of teams who end up submitting thousands of predictive models to a Challenge. Sage and DREAM couldn’t be more delighted to be partnered with the ICGC and TCGA research communities to provide the largest public methodology assessment for the field of somatic mutation identification.”

Press Release