ENCODE DNase I allelic imbalance data --------------------------- If you have any questions, please reach us at: sabramov@altius.org Sergey Abramov sboytsov@altius.org Alexandr Boytsov --------------------------- Accessing the data¶ The data is stored at https://resources.altius.org/~jvierstra/projects/encode4-allelic-imbalance-v1 The directory contains: Metadata file: metadata.tsv Metadata file with ENCODE IDs: metadata+encode_id.tsv All tested variants: cav_pvalues.1010.melt.sorted.bed.gz This readme file: README.tsv Jupyter notebook: ENCODE DNASE I Allelic Imbalance - 2022-10-6.ipynb phase1 genotypes (before merging): genotypes.phase1.vcf.gz --------------------------- Metadata file ag_id - a unique identifier of a sample idniv_id - individual genotype id. Samples with the same indiv_id have similar genotype (not necessarily the same cell type) --------------------------- Genotypes: Genotypes are provided in VCF format, each sample id corresponds to ag_id in metadata file. The columns are sorted in the same order as rows of the metadata file. --------------------------- Variants format: Variants are stored in the bed-like format: #chr, start, end: genomic position of the SNV, hg38 genome assembly; ID: rsSNP ID of the SNV according to the dbSNP build 151; ref: reference allele (A,C,G, or T, according to hg38); alt: alternative allele; ref_counts, alt_counts: number of reads mapped to the alleles; sample_id: unique identifier of the sample, corresponds to ag_id in metadata file BAD: background allelic dosage estimation at the variant. Higher BAD values correspond to the higher contribution of aneuploidy and local copy-number variants. BAD scores serve as a baseline when estimating the statistical significance and the effect size of each tested variant. BAD=1 in case of diploid and BAD=2 in case of triploid. es: Effect size of allelic imbalance at the variant, log2. Positive values indicate imbalance favouring the reference allele. Corrected for BAD. pval_ref, pval_alt: allele-wise P-value estimations of one-sided tests. Lower P-values indicate the preference of the corresponding allele, e.g. low pval_ref means the variant is reference imbalanced You can download the variants in a specific region using tabix tabix https://resources.altius.org/~jvierstra/projects/encode4-allelic-imbalance-v1/cav_pvalues.1010.melt.sorted.bed.gz ${region} > cav_pvalues_${region}.bed ----------------------------