Integrating Large-Scale Human Genetic and Regulatory Genomic Data to Functionally Annotate ctcf Binding Variation

dc.contributor.advisorRuderfer, Douglas M
dc.contributor.committeeChairGamazon, Eric
dc.creatorTubbs, Colby
dc.creator.orcid0000-0001-7676-8735
dc.date.accessioned2025-06-06T09:45:11Z
dc.date.available2025-06-06T09:45:11Z
dc.date.created2025-05
dc.date.issued2025-03-11
dc.date.submittedMay 2025
dc.description.abstractCCCTC binding factor (CTCF) regulates gene expression through DNA binding at thousands of genomic loci. Genetic variation in these CTCF binding sites (CBSs) are important drivers of phenotypic variation, yet extracting those that are likely to have functional consequences in whole genome sequencing (WGS) remains challenging. Through this dissertation, I explore conceptual frameworks to identify and prioritize CBS variants in gnomAD, a WGS database consisting of 76,156 individuals. First, I integrate computational and experimental predictions of CTCF binding into an empirical false-positive measure that can be applied to the score distribution of a precision-weight matrix. I then synthesize CTCF’s binding patterns at 1,063,878 genomic loci across 214 biological contexts into a summary of binding activity. This measure correlates with both conserved nucleotides and sequences that contain high-quality CTCF binding motifs. Finally, I use binding activity to evaluate high confidence allelic binding predictions for 1,253,329 SNVs in gnomAD that disrupt a CBS. I find a strong, positive relationship between the mutability adjusted proportion of singletons (MAPS) metric and the loss of CTCF binding at loci with high in vitro activity. Together, this body of work nominates thousands of rare, noncoding variants that disrupt CTCF binding for further functional studies while providing a blueprint for synthesizing large-scale genomic data to better prioritize noncoding variation in human disease studies.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/1803/19746
dc.language.isoen
dc.subjectCTCF, noncoding, variation, disease, selection, functional, annotate, prioritize, transcription, factor, binding, constraint, SNV, ENCODE, GNOMAD
dc.titleIntegrating Large-Scale Human Genetic and Regulatory Genomic Data to Functionally Annotate ctcf Binding Variation
dc.typeThesis
dc.type.materialtext
thesis.degree.disciplineHuman Genetics
thesis.degree.grantorVanderbilt University Graduate School
thesis.degree.levelDoctoral
thesis.degree.namePhD
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TUBBS-DISSERTATION-2025.pdf
Size:
3.19 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.93 KB
Format:
Plain Text
Description: