These data delineate at nucleotide resolution ~4.5 million compact genomic elements encoding transcription factor occupancy from high-density DNase I cleavage maps from 243 human cell, tissue types and states.
Metadata describing the individual datasets and cell types is available for download as a XLS file.
To identify DNase I footprints genome-wide, we used computational approach incorporating both chromatin architecture and empiricallly-derived DNase I sequence preferences to determine expected per-nucleotide cleavage rates across the genome, and to construct, for each biosample, a statistical model for testing whether observed cleavage rates at individual nucleotides deviated significantly from expectation.
De novo footprint detection was performed in two steps:
We implemented an empirical Bayes framework that estimates the posterior probability that a given nucleotide is footprinted by incorporating a prior on the presence of a footprint (determined by footprints independently identified within individual datasets) and a likelihood model of cleavage rates for both occupied and unoccupied sites.
We applied this approach to all DHSs detected within one or more of the 243 biosamples, and used a consensus approach [1] to collate overlapping footprinted regions across individual biosamples into distinct high-resolution footprints (ie., consensus footprints)
Footprint data is publically available at https://wwww.vierstra.org/resources/dgf or ZENODO (DOI: 10.5281/zenodo.3603548).
Code and documentation available at GitHub and Read the docs.
This work was supported by NHGRI grants U54HG007010 and 5UM1HG009444.
Please direct any questions/comments/inquiries to Jeff Vierstra (jvierstra@altius.org).
Vierstra, J., Lazar, J., Sandstrom, R. et al. Global reference mapping of human transcription factor footprints. Nature 583, 729–736 (2020).