Consensus digital genomic footprints from ENCODE DNase I data

Description

These data delineate at nucleotide resolution ~4.5 million compact genomic elements encoding transcription factor occupancy from high-density DNase I cleavage maps from 243 human cell, tissue types and states.

Data included in this trackhub

Per-nucleotide DNase I cleavage and associated statistics for 243 biosamples
De novo footprints identified at various false discovery rate thresholds within each dataset
Consensus footprints derived from all biosamples
Motif models overlapping consensus footprints

Metadata describing the individual datasets and cell types is available for download as a XLS file.

Methods

De novo footprinting

To identify DNase I footprints genome-wide, we used computational approach incorporating both chromatin architecture and empiricallly-derived DNase I sequence preferences to determine expected per-nucleotide cleavage rates across the genome, and to construct, for each biosample, a statistical model for testing whether observed cleavage rates at individual nucleotides deviated significantly from expectation.

De novo footprint detection was performed in two steps:

Learning a dispersion model used to assign statistical significance to the observed per-nucleotide cleavage rates
Generation of expected cleavages and statistical testing of cleavages rates per-nucleotide within acessible DNA

Consensus footprinting

We implemented an empirical Bayes framework that estimates the posterior probability that a given nucleotide is footprinted by incorporating a prior on the presence of a footprint (determined by footprints independently identified within individual datasets) and a likelihood model of cleavage rates for both occupied and unoccupied sites.

We applied this approach to all DHSs detected within one or more of the 243 biosamples, and used a consensus approach [1] to collate overlapping footprinted regions across individual biosamples into distinct high-resolution footprints (ie., consensus footprints)

Data availability

Footprint data is publically available at https://wwww.vierstra.org/resources/dgf or ZENODO (DOI: 10.5281/zenodo.3603548).

Code and documentation available at GitHub and Read the docs.

Credits

This work was supported by NHGRI grants U54HG007010 and 5UM1HG009444.

Please direct any questions/comments/inquiries to Jeff Vierstra (jvierstra@altius.org).

Citation

If you uses these data in your research, please cite:

Vierstra, J., Lazar, J., Sandstrom, R. et al. Global reference mapping of human transcription factor footprints. Nature 583, 729–736 (2020).

References

[1] Meuleman, W., Muratov, A., Rynes, E. et al. Index and biological spectrum of human DNase I hypersensitive sites. Nature 584, 244–251 (2020).