Repository
A growing catalogue of globally-coherent datasets
As part of building the Sage Commons infrastructure and community, Sage Bionetworks is compiling a catalogue of globally-coherent datasets for use in integrative genomics and building predictive computational disease models. Our goal is to collate, curate and host these datasets for the community.
Globally-coherent datasets (GCDs) contain genome-wide DNA variation and intermediate trait as well as physiological phenotype data from a population of individuals large enough to power association or linkage studies (often 50 or more individuals.) Intermediate traits are typically gene expression, but may also include proteomic, metabolomic, and other molecular data. To be considered coherent, the data must be matched with consistent identifiers.
The datasets described on the linked pages described below represent the current state of knowledge of GCDs worldwide and will be updated frequently as further information becomes available. Some of the dataset are freely available from Sage or other sources while others are in transition to release or in preparation. In some case individual elements of a GCD and associated models may have to be linked to different repositories in order to respect investigator obligations or applicable regulations.
1. Sage Available and Sage Transition Datasets
Sage Available datasets are currently available from the Sage Bionetworks servers either as a complete download package, as a subset of a dataset along with relevant links, or through web links. Sage Transition datasets are in the process of being made available from the Sage web site and interested researchers should check back for updates.
This page has descriptions of datasets with known or anticipated legal release problems that must be addressed prior to public posting.
This page lists known datasets that are currently being generated.
These datasets may be helpful resources in combination with other research datasets and tools even though they do not fully fit the definition of a globally-coherent dataset.
The repository project is an important step in the development of the Sage Commons. The Commons will be the environment for shared research on biological network models and their application to problems of human disease and biology. It will contain network models derived from Global Coherent Data Sets, the data sets themselves, and the analytical methods and code used to generate the network models. Sage Commons models, data, and code will be well annotated and of robust quality so that they may be legitimately combined for meta-analyses within the Sage Commons and in combination with data and models of other researchers.
Invitation to Participate
Researchers interested in contributing datasets, analyses and models are invited to contact us at repdata@sagebase.org for further discussion.
