Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec 2;108(12):2271-2283.
doi: 10.1016/j.ajhg.2021.11.004.

Phenome risk classification enables phenotypic imputation and gene discovery in developmental stuttering

Affiliations

Phenome risk classification enables phenotypic imputation and gene discovery in developmental stuttering

Douglas M Shaw et al. Am J Hum Genet. .

Abstract

Developmental stuttering is a speech disorder characterized by disruption in the forward movement of speech. This disruption includes part-word and single-syllable repetitions, prolongations, and involuntary tension that blocks syllables and words, and the disorder has a life-time prevalence of 6-12%. Within Vanderbilt's electronic health record (EHR)-linked biorepository (BioVU), only 142 individuals out of 92,762 participants (0.15%) are identified with diagnostic ICD9/10 codes, suggesting a large portion of people who stutter do not have a record of diagnosis within the EHR. To identify individuals affected by stuttering within our EHR, we built a PheCode-driven Gini impurity-based classification and regression tree model, PheML, by using comorbidities enriched in individuals affected by stuttering as predicting features and imputing stuttering status as the outcome variable. Applying PheML in BioVU identified 9,239 genotyped affected individuals (a clinical prevalence of ∼10%) for downstream genetic analysis. Ancestry-stratified GWAS of PheML-imputed affected individuals and matched control individuals identified rs12613255, a variant near CYRIA on chromosome 2 (B = 0.323; p value = 1.31 × 10-8) in European-ancestry analysis and rs7837758 (B = 0.518; p value = 5.07 × 10-8), an intronic variant found within the ZMAT4 gene on chromosome 8, in African-ancestry analysis. Polygenic-risk prediction and concordance analysis in an independent clinically ascertained sample of developmental stuttering cases validate our GWAS findings in PheML-imputed affected and control individuals and demonstrate the clinical relevance of our population-based analysis for stuttering risk.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

Figure 1
Figure 1
Outline of PheML development and application Within a set of 3.1 million deidentified electronic health records (A), we first identified a small pool of subjects (B) with developmental stuttering through expert manual review. We selected these patients and their demographically matched controls to identify comorbidities as predictive features and develop and test a machine-learning model (C) that would impute stuttering in BioVU (D), an independent EHR dataset linked to genetic data. We then performed a GWAS by using the imputed phenotype as the dependent variable in the labeled genetic dataset (E) to identify genetic variants associated with imputed stuttering (F).
Figure 2
Figure 2
Manhattan plot and qq-plot of results from GWAS of European-ancestry individuals predicted by PheML to exhibit developmental stuttering Analysis included 7,751,954 variants across chromosomes 1–22. One locus in chromosome 2 reached genome-wide significance (p < 5 × 10−8); the sentinel variant, rs12613255 (BETA = 0.323; p = 1.31 × 10−8), was 113 kb 3' of CYRIA (FAM49A is an alias for CYRIA). The red line indicates the threshold for genome-wide significance (5.0 × 10−8), and the blue line indicates the threshold for suggestive significance (1.0 × 10−5). Loci reported in Table 4 are labeled on the plot as well as the nearest gene.
Figure 3
Figure 3
LocusZoom plot for rs12613255 locus in EUR PheML stuttering GWAS The lead variant (marked as a diamond) was found in chromosome 2, 113 kb 3' of CYRIA. A dashed line indicates the threshold for genome-wide significance (5.0 × 10−8).
Figure 4
Figure 4
Manhattan plot and qq plot of results from GWAS of African-ancestry individuals predicted by PheML to exhibit developmental stuttering Analysis included 13,643,593 variants across chromosomes 1–22. One variant, rs7837758, reached genome-wide significance (BETA = 0.518; p = 5.07 × 10−8), on chromosome 8 within the third intron of ZMAT4. The red line indicates the threshold for genome-wide significance (5.0 × 10−8), and the blue line indicates the threshold for suggestive significance (1.0 × 10−5). Loci reported in Table 4 are labeled on the plot as well as the nearest gene.
Figure 5
Figure 5
LocusZoom plot for the rs7837758 locus in the AFR PheML Stuttering GWAS The lead variant (marked as a diamond) was found on chromosome 8, within the third intron of ZMAT4. A dashed line indicates the threshold for genome-wide significance (5.0 × 10−8).

References

    1. Wingate M.E. A standard definition of stuttering. J. Speech Hear. Disord. 1964;29:484–489. - PubMed
    1. Yairi E., Ambrose N. Epidemiology of stuttering: 21st century advances. J. Fluency Disord. 2013;38:66–87. - PMC - PubMed
    1. Ajdacic-Gross V., Vetter S., Müller M., Kawohl W., Frey F., Lupi G., Blechschmidt A., Born C., Latal B., Rössler W. Risk factors for stuttering: A secondary analysis of a large data base. Eur. Arch. Psychiatry Clin. Neurosci. 2010;260:279–286. - PubMed
    1. Yairi E. The onset of stuttering in two- and three-year-old children: a preliminary report. J. Speech Hear. Disord. 1983;48:171–177. - PubMed
    1. Singer C.M., Hessling A., Kelly E.M., Singer L., Jones R.M. Clinical characteristics associated with stuttering persistence: A meta-analysis. J. Speech Lang. Hear. Res. 2020;63:2995–3018. - PMC - PubMed

Publication types