[an error occurred while processing this directive] [an error occurred while processing this directive]
ENCODE Project at NHGRI Integrative Analysis of ENCODE Data

Contents

Updated 5 September 2012

Introduction

The major goal of the ENCODE project is to identify all functional elements in the human genome sequence, where functional element is defined as a discrete region of the genome that encodes a reproducible biochemical signature. ENCODE data production groups generate data and submit the data to the ENCODE Data Coordinating Center (DCC) for quality control and release. A cross-consortium effort to perform integrated analysis of all the data types to generate useful integrative data interpretations for the community has come to completion. The results of these analyses have been published as the ENCODE integrative analysis publication package. This page describes a series of resources associated with the integrative analysis of ENCODE data.

Analysis Tools and Other Resources

ENCODE Analysis Package Publications and Website

A description of the ENCODE project, data production, data display, and data download has been published previously in the article, A user's guide to the encyclopedia of DNA elements (ENCODE). The resources in this article, and follow-on analyses, are described in detail in an extensive package of ENCODE integrative analysis publications and on a Nature microsite. Questions regarding the package should be directed to Ian Dunham.

ENCODE Analysis Virtual Machine

The supplementary information for the ENCODE integrative analysis Nature publication includes a set of code bundles that provide the scripts and processing steps corresponding to the methodology used in the analyses associated with the paper. The analysis group has established an ENCODE analysis virtual machine instance of the software, using the code bundles, where each analysis program has been tested and run. The virtual machines are freely available for interested parties to use to work with the data and tools used in the integrative analysis.

Questions regarding the ENCODE analysis virtual machines should be directed to James Taylor or Dannon Baker.

Software Tools

A page describing the software tools used in the ENCODE project is provided at ENCODE portal.

Data Standards and Quality Metrics

As part of the integrative analysis, the ENCODE project has established a number of standards, viewable from the ENCODE portal at the Data Standards page. A detailed description of the ChIP-seq standards is provided in the publication: Landt et al., ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 2012 Sep;22(9):1813-31. Metrics that have been developed to enable comparison and standardized processing of the data are described on the portal Quality Metrics page.

Other ENCODE Publications

A list of publications funded by the ENCODE project and publications from groups outside the consortium using ENCODE data can be found on the ENCODE Publications page.

Data

Data Coordination Center Resources

All ENCODE production data is submitted to the DCC at the University of California, Santa Cruz (UCSC). Data is reviewed for quality and released for display as tracks in the UCSC Genome Browser and for download at the UCSC downloads site. There are a number of useful tools including track and file search to assist with locating data of interest. The DCC maintains the ENCODE portal providing access to this data and tools for interpreting and accessing the data.

Analysis Data Hub

The integrative analysis process has been a distributed effort by many groups. Individual analysts downloaded and processed files from the ENCODE download site, and created intermediate and final analysis products in various forms. Now that the analysis has been completed, the analysis data is being made available for viewiing and download through a UCSC public data hub. This data hub includes descriptions of ENCODE data in uniformly processed signal and element representations, as well as genome segmentations. The ENCODE downloads page includes an Analysis Hub section that provides access to files on the hub. Click here to visualize the ENCODE Integrative Analysis Data Hub in the UCSC Genome Browser.

Analysis FTP Site

Access to the analysis products are also provided via anonymous FTP from the EBI ENCODE analysis FTP server. This site contains an organized file structure with the ENCODE analysis datasets located in subdirectories within the byDataType directory.

Other Locations

ENCODE data is also available through other genomics portals, including Ensembl and the NCBI Gene Expression Omnibus (GEO). Raw sequence data are deposited in the sequence read archives, SRA and ENA.