Integrating Large-Scale Human Genetic and Regulatory Genomic Data to Functionally Annotate ctcf Binding Variation

Loading...
Thumbnail Image
Date
2025-03-11
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
CCCTC binding factor (CTCF) regulates gene expression through DNA binding at thousands of genomic loci. Genetic variation in these CTCF binding sites (CBSs) are important drivers of phenotypic variation, yet extracting those that are likely to have functional consequences in whole genome sequencing (WGS) remains challenging. Through this dissertation, I explore conceptual frameworks to identify and prioritize CBS variants in gnomAD, a WGS database consisting of 76,156 individuals. First, I integrate computational and experimental predictions of CTCF binding into an empirical false-positive measure that can be applied to the score distribution of a precision-weight matrix. I then synthesize CTCF’s binding patterns at 1,063,878 genomic loci across 214 biological contexts into a summary of binding activity. This measure correlates with both conserved nucleotides and sequences that contain high-quality CTCF binding motifs. Finally, I use binding activity to evaluate high confidence allelic binding predictions for 1,253,329 SNVs in gnomAD that disrupt a CBS. I find a strong, positive relationship between the mutability adjusted proportion of singletons (MAPS) metric and the loss of CTCF binding at loci with high in vitro activity. Together, this body of work nominates thousands of rare, noncoding variants that disrupt CTCF binding for further functional studies while providing a blueprint for synthesizing large-scale genomic data to better prioritize noncoding variation in human disease studies.
Description
Keywords
CTCF, noncoding, variation, disease, selection, functional, annotate, prioritize, transcription, factor, binding, constraint, SNV, ENCODE, GNOMAD
Citation