Consensus digital genomic footprints from ENCODE DNase I data

Description

These data delineate at nucleotide resolution ~4.5 million compact genomic elements encoding transcription factor occupancy from high-density DNase I cleavage maps from 243 human cell, tissue types and states.

Data included in this trackhub

Metadata describing the individual datasets and cell types is available for download as a XLS file.

Methods

De novo footprinting

To identify DNase I footprints genome-wide, we used computational approach incorporating both chromatin architecture and empiricallly-derived DNase I sequence preferences to determine expected per-nucleotide cleavage rates across the genome, and to construct, for each biosample, a statistical model for testing whether observed cleavage rates at individual nucleotides deviated significantly from expectation.

De novo footprint detection was performed in two steps:

  1. Learning a dispersion model used to assign statistical significance to the observed per-nucleotide cleavage rates
  2. Generation of expected cleavages and statistical testing of cleavages rates per-nucleotide within acessible DNA

Consensus footprinting

We implemented an empirical Bayes framework that estimates the posterior probability that a given nucleotide is footprinted by incorporating a prior on the presence of a footprint (determined by footprints independently identified within individual datasets) and a likelihood model of cleavage rates for both occupied and unoccupied sites.

We applied this approach to all DHSs detected within one or more of the 243 biosamples, and used a consensus approach [1] to collate overlapping footprinted regions across individual biosamples into distinct high-resolution footprints (ie., consensus footprints)

Data availability

Footprint data is publically available at https://wwww.vierstra.org/resources/dgf or ZENODO (DOI: 10.5281/zenodo.3603548).

Code and documentation available at GitHub and Read the docs.

Credits

This work was supported by NHGRI grants U54HG007010 and 5UM1HG009444.

Please direct any questions/comments/inquiries to Jeff Vierstra (jvierstra@altius.org).

Citation

If you uses these data in your research, please cite:

Vierstra, J., Lazar, J., Sandstrom, R. et al. Global reference mapping of human transcription factor footprints. Nature 583, 729–736 (2020).

References

[1] Meuleman, W., Muratov, A., Rynes, E. et al. Index and biological spectrum of human DNase I hypersensitive sites. Nature 584, 244–251 (2020).