Network Analysis and Visualization of Disease Multimorbidity Using Electronic Health Records and Genetic Biobank Data

dc.contributor.advisorXu, Yaomin
dc.contributor.committeeChairShotwell, Matthew S.
dc.creatorZhang, Siwei
dc.creator.orcid0009-0005-0873-5217
dc.date.accessioned2025-09-26T11:10:39Z
dc.date.created2025-08
dc.date.issued2025-05-28
dc.date.submittedAugust 2025
dc.description.abstractDisease multimorbidity, the co-occurrence of multiple diseases within an individual, presents complex challenges for both public health and precision medicine. Advancing our understanding of multimorbidity can illuminate disease mechanisms, reveal patient heterogeneity, and enable biomarker discovery and treatment repurposing. Large-scale Electronic Health Records (EHR) and EHR-linked genetic biobanks offer unique opportunities to quantify phenome-wide multimorbidity, uncover shared genetic mechanisms among co-occurring conditions, and define multimorbidity-based disease clusters. However, major analytical and methodological challenges remain. To address these, we present three key contributions. First, we introduce a phenome-wide multimorbidity network that quantifies nonrandom disease-disease co-occurrences while accounting for potential confounding factors. Second, we develop a genetic discovery platform that integrates polygenic scores for predicted transcriptomic, proteomic, and metabolomic traits with phenome-wide association studies (PheWAS) to uncover shared biological mechanisms among multimorbid conditions. To support exploration, we also develop an interactive network visualization tool featuring dynamic cluster analysis of biological pathways linked to diseases with similar multimorbidity patterns, enabling intuitive exploration of complex disease relationships and their shared biological mechanism. Third, we propose a model-based clustering framework using a bipartite stochastic block model (biSBM) with a stability-driven post-processing step to identify robust disease clusters and patient subgroups from individual-level EHR data. This framework demonstrates superior performance in simulations and replicates coherent, interpretable multimorbidity structures across independent datasets, including UK Biobank and Vanderbilt BioVU. A case study of JAK2V617F somatic mutation carriers reveals genetic heterogeneity across patient subgroups with distinct multimorbidity patterns, illustrating the potential of our data-driven approach to uncover mechanistic insights into patient heterogeneity through EHR-derived multimorbidity networks.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/1803/19890
dc.language.isoen
dc.subjectNetwork analysis
dc.subjectDisease multimorbidity
dc.subjectElectronic Health Records
dc.subjectGenetic Biobank
dc.titleNetwork Analysis and Visualization of Disease Multimorbidity Using Electronic Health Records and Genetic Biobank Data
dc.typeThesis
dc.type.materialtext
local.embargo.lift2027-08-01
local.embargo.terms2027-08-01
thesis.degree.disciplineBiostatistics
thesis.degree.grantorVanderbilt University Graduate School
thesis.degree.levelDoctoral
thesis.degree.namePhD
Files
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.93 KB
Format:
Plain Text
Description: