The shared genetic architecture and evolution of human language and musical rhythm

Alagöz, Gökberk; Eising, Else; Mekki, Yasmina; Bignardi, Giacomo; Fontanillas, Pierre; Nivard, Michel G.; Luciano, Michelle; Cox, Nancy J.; Fisher, Simon E.; Gordon, Reyna L.

doi:10.1038/s41562-024-02051-y

Download PDF

Article
Open access
Published: 21 November 2024

The shared genetic architecture and evolution of human language and musical rhythm

Nature Human Behaviour volume 9, pages 376–390 (2025)Cite this article

17k Accesses
11 Citations
230 Altmetric
Metrics details

Subjects

This article has been updated

Abstract

This study aimed to test theoretical predictions over biological underpinnings of previously documented phenotypic correlations between human language-related and musical rhythm traits. Here, after identifying significant genetic correlations between rhythm, dyslexia and various language-related traits, we adapted multivariate methods to capture genetic signals common to genome-wide association studies of rhythm (N = 606,825) and dyslexia (N = 1,138,870). The results revealed 16 pleiotropic loci (P < 5 × 10⁻⁸) jointly associated with rhythm impairment and dyslexia, and intricate shared genetic and neurobiological architectures. The joint genetic signal was enriched for foetal and adult brain cell-specific regulatory regions, highlighting complex cellular composition in their shared underpinnings. Local genetic correlation with a key white matter tract (the left superior longitudinal fasciculus-I) substantiated hypotheses about auditory–motor connectivity as a genetically influenced, evolutionarily relevant neural endophenotype common to rhythm and language processing. Overall, we provide empirical evidence of multiple aspects of shared biology linking language and musical rhythm, contributing novel insight into the evolutionary relationships between human musicality and linguistic communication traits.

Musical rhythm abilities and risk for developmental speech-language problems and disorders: epidemiological and polygenic associations

Article Open access 24 September 2025

Genome-wide association study of musical beat synchronization demonstrates high polygenicity

Article Open access 16 June 2022

Multivariate genome-wide covariance analyses of literacy, language and working memory skills reveal distinct etiologies

Article Open access 19 August 2021

Main

The human brain has intricate neural circuitry to process complex communicative signals and behaviours, including speech and music, and the extent of biological overlap between these facets is an important question for the field of neurobiology. Individual differences in rhythm-related skills are correlated with variability in language-related skills, implicating potentially shared underlying neural and genetic architectures¹. Previous research on the relationship between rhythm and language-related skills used a wide range of task-based tests to measure aspects of rhythm (for example, beat synchronization, rhythm perception and production, and metrical perception) and spoken and written language abilities (for example, word recognition, spelling and phonological awareness). For instance, rhythm perception skills can be measured by quantifying participants’ ability to discriminate differences in durations of adjacent tones in melodies, whereas beat synchronization skills can be quantified on the basis of participants’ success in tapping along to a metronome² or to the beat (pulse) of musical excerpts³. Similarly, language-related skills, such as word reading ability, can be measured as the ability to sound out words quickly and accurately in a limited amount of time⁴, and spelling skills can be measured by testing participants’ abilities to correctly spell out a number of words read aloud by the tester⁵. Importantly, even though language and rhythm measurement tasks involve different signals and stimuli and capture different skills, studies of individual differences often show phenotypic correlations between the different traits⁶. Nayak et al.¹ compiled information from 25 studies that identified significant positive phenotypic correlations between rhythm and language-related skills, synthesizing findings on a total of 397 children and 606 adults. Consistent associations have been found between rhythm perception, beat synchronization and language-related skills including speech perception, word reading and grammatical skills at various phases in the lifespan from pre-school age through to later adulthood. Despite these lines of evidence showing phenotypic associations between non-linguistic rhythmic processing and language skills, empirical evidence at the intersection between the neurobiological, genetic and evolutionary grounds of these traits remain to be discovered.

Various theoretical frameworks^7,8,9, such as the revised vocal learning hypothesis¹⁰, provide overarching perspectives on how rhythm and multiple facets of human communication might relate in a neurodevelopmental and evolutionary context. According to the revised vocal learning hypothesis, human vocal learning ability is a pre-adaptation for predictive and tempo-flexible beat synchronization, and beat processing and vocal learning rely on overlapping neural circuits. This view is in line with neural reuse theories, such as neuronal recycling¹¹ and massive redeployment hypotheses¹², which suggest overlapping neurobiological circuits for language- and rhythm-related skills. Neural reuse theories claim that cultural innovations, such as reading, invade evolutionarily older brain substrates via the reallocation of an existing neural circuit to a new behaviour. Some argue that cognitive systems, such as language and music, are better understood as different uses of similar information processing mechanisms¹³, yet the genetic and evolutionary bases of putative shared neural circuits are largely unknown.

To address prominent theories on the evolution of language development and musical rhythm in humans¹⁰, evidence so far has been taken largely from psychology, neuroscience and cross-species comparisons rather than genetics^14,15. We believe that identifying potential shared genetic architecture between language-related disorders and musical rhythm abilities, and probing the evolutionary past of the implicated genomic regions, can help reveal the neural and biological characteristics of our species that made rhythm and language an asset to human development and behaviour. Importantly, individuals with rhythm impairment have been suggested to show higher predisposition to language-related difficulties, such as developmental language disorder and dyslexia (atypical rhythm risk hypothesis, ARRH)¹⁶. Given that disorders of language and reading can have long-term health impacts, identifying genetic factors that they share with rhythm impairment may enhance future possibilities for diagnoses and treatment. Moreover, basic science concerning the biological substrates of these fundamental human traits will be informed by new approaches to their potentially shared genetic architecture.

Our work built on two recent genome-wide association studies (GWAS) that represent by far the most well-powered genetic investigations of rhythm-/language-relevant traits so far, one for musical rhythm (beat synchronization, hereafter referred to as rhythm; ‘can you clap in time with a musical beat?’ N_cases(yes) = 555,660, N_controls(no) = 51,165)¹⁷ and the other for dyslexia (developmental reading/spelling difficulties; ‘have you been diagnosed with dyslexia?’ N_cases(yes) = 51,800, N_controls(no) = 1,087,070)¹⁸, both performed on a 23andMe Inc. research cohort in individuals of European ancestry and both classified as binary traits. We used the dyslexia GWAS as a proxy for the genetic underpinnings of language- and reading-related aspects of human communication, as dyslexia often co-occurs with a number of speech/language disorders^19,20,21,22. Beat synchronization GWAS was used as a proxy for musical rhythm skills, as beat perception and synchronization are considered to be important features of musical experiences in present-day humans^23,24. We applied a three-stage analytic pipeline to investigate shared genetics and biology: (1) genome-wide genetic correlations between rhythm and dyslexia (as well as other language-related traits) using linkage disequilibrium score regression (LDSC)²⁵, (2) multivariate GWAS (mvGWAS) of rhythm impairment and dyslexia using genomic structural equation modelling (SEM)²⁶ and (3) post-mvGWAS analyses of the shared genomic infrastructure as windows into its evolution and biology (Fig. 1a).

**Fig. 1: Study design and genetic correlations between rhythm and language or reading-related traits.**

Results

In the first stage, we estimated genetic correlations between rhythm and dyslexia, as well as quantitative measures of language or reading performance²⁷, educational traits²⁸ and brain–language-related endophenotypes^29,30 by using LDSC²⁵. We found moderate but significant genetic correlations between rhythm and dyslexia (magnitude of the genetic correlation (r_g) (s.e.m.) = −0.28 (0.02), P_FDR = 2.05 × 10⁻³¹), five quantitative language or reading measures, three educational traits and two language-relevant neuroimaging endophenotypes (Fig. 1b and Supplementary Table 1). In contrast there were negligible and non-significant genetic correlations with non-verbal intelligence quotient (IQ) (r_g (s.e.m.) = −0.004 (0.047), P_FDR = 0.94) and overall school performance (r_g (s.e.m.) = −0.066 (0.040), P_FDR = 0.11) (Fig. 1b and Supplementary Table 1). Thus, rhythm is genetically correlated not only with dyslexia, but also with multiple language-related phenotypes including word and non-word reading, non-word repetition, phoneme awareness, having better language skills than mathematics, and language resting-state functional connectivity (|r_g| median of 0.184 and range of 0.004–0.376), providing empirical genetic evidence for the ARRH. The absence of significant genetic correlations between rhythm and cognitive traits, such as non-verbal IQ and overall school performance, provides evidence that genetic sharing between rhythm and dyslexia is not driven by general cognition. These results represent the first direct empirical support for a shared genetic architecture underlying previously observed phenotypic correlations between rhythm and language-related traits¹, such as dyslexia (Pearson correlation of −0.04, 95% confidence interval (CI) −0.05 to −0.04, t = −25.96, d.f. of 363,285, P < 2.2 × 10⁻¹⁶).

Given that dyslexia is a neurodevelopmental disorder with effects particularly apparent in the written language domain (evident from reading and/or spelling difficulties)¹⁸ and that other work has shown rhythm impairments associated with dyslexia^19,20,21,22, we expect it to be genetically and phenotypically linked to impairment in rhythm (hereafter referred to as rhythm impairment) rather than rhythm ability (this expectation is supported by the negative sign of the genetic correlation observed in the first stage of our pipeline above). Thus, we reversed the effect directions in the binary rhythm GWAS summary statistics to align genetic effect directions for rhythm and reading impairments. We then performed a mvGWAS on the rhythm impairment and dyslexia GWAS to probe the validity of the ARRH at the genetic level, using a bivariate extension of genomic SEM²⁶ that we developed (Methods). This allowed us to tease apart the genetic effects shared between rhythm impairment and dyslexia from those that are unique to each. We specified a measurement model with a shared genetic factor (F_gRI-D, where RI-D stands for rhythm impairment-dyslexia), which recaptured the genetic correlation between two traits (σ²F_gRI-D (s.e.m.) = 0.28 (0.03)). Similar to Grotzinger et al.³¹, we then applied both the common pathway model (CPM), which regresses single nucleotide polymorphisms (SNPs) from F_gRI-D (Supplementary Fig. 1), and the independent pathway model (IPM), which regresses SNPs directly onto the genetic components of the two traits (Supplementary Fig. 1). Thus, we were able to obtain a quantitative per-SNP score quantifying the extent to which any given SNP influences rhythm impairment or dyslexia independent from F_gRI-D, that is, the bivariate genetic heterogeneity (Q_b).

Our mvGWAS analysis with the CPM resulted in a new set of summary statistics representing the genetic overlap between rhythm impairment and dyslexia, and identified 18 genome-wide significant (P < 5 × 10⁻⁸) loci associated with F_gRI-D (Fig. 2a and Supplementary Table 2) after genomic control (GC) correction (Supplementary Fig. 2). We estimated the SNP heritability of F_gRI-D as 13% (s.e.m. of 0.005) by LDSC²⁵. The strongest mvGWAS signal came from the SNP rs28576629 (P = 3.79 × 10⁻¹⁴) on chromosome 3 (Fig. 2a), an intronic variant in PPP2R3A, a gene encoding a regulatory subunit of protein phosphatase 2 (ref. ³²). We validated the genomic SEM CPM results using two additional mvGWAS methods: (1) N-weighted genome-wide association meta-analysis (GWAMA)³³ and (2) cross-phenotype association analysis (CPASSOC)³⁴. Both methods captured highly similar genomic architectures to the one captured by the CPM (Supplementary Fig. 3), confirming that the shared genetics of rhythm impairment and dyslexia could be identified consistently regardless of analytic tool. The IPM resulted in two new sets of summary statistics capturing the genetic factors of rhythm impairment and dyslexia that are independent from F_gRI-D, so-called independent factors (Supplementary Fig. 4). We used the IPM results to obtain Q_b and mapped the per-SNP Q_b scores onto CPM mvGWAS results to dissociate the homogeneous (hereafter referred to as pleiotropic) signals from the signals driven by a single GWAS (Fig. 2a). We identified 27 significant genome-wide (P < 5 × 10⁻⁸) heterogeneous loci in the Q_b results (Fig. 2a and Supplementary Table 3), and two of these loci are co-localized with two CPM signals on chromosome 20 (30,690,943–31,189,993 and 47,514,881–47,821,129), which are mvGWAS signals that are driven by the dyslexia GWAS (Fig. 2a). Thus, our analysis revealed two distinct patterns for CPM mvGWAS hit loci: 16 highly homogeneous (putatively pleiotropic) and two heterogeneous loci indicating different levels of GWAS significance, effect sizes and/or opposite effect directions for these two loci in the rhythm impairment and dyslexia GWASs (Fig. 2b shows the representative loci of each type).

**Fig. 2: Manhattan plots for univariate and mvGWASs and heterogeneity, including examples of highly homogeneous and heterogeneous loci in F_gRI-D results.**

Next, we performed a transcriptome-wide association study (TWAS) using F_gRI-D summary statistics, and whole-blood and 13 GTEx brain tissue phenotype weights^35,36 with S-PrediXcan³⁷ (Supplementary Table 4). Our TWAS analysis identified 1,275 significant (P_FDR < 0.05) gene–tissue pairs and 315 significant (P_FDR < 0.05) unique genes associated with F_gRI-D after false discovery rate (FDR) correction (Fig. 3a and Supplementary Table 5). Some of the top significant gene–tissue pairs associated with F_gRI-D are AC072039.2 expression in brain nucleus (Z-score of −7.74, P_FDR = 1.17 × 10⁻⁹), PPP2R3A expression in cerebellum (Z-score of 7.49, P_FDR = 2.43 × 10⁻⁹) and putamen (Z-score of 7.47, P_FDR = 2.43 × 10⁻⁹) and FOXO3 expression in anterior cingulate cortex (Z-score of 6.07, P_FDR = 1.15 × 10⁻⁵) (Fig. 3a). Functional enrichment analysis of the significant (P_FDR < 0.05) TWAS genes using PANTHER³⁸ did not identify any significant enrichments in Gene Ontology (GO) and PANTHER GO-Slim terms^38,39,40,41 after accounting for multiple testing (Supplementary Tables 6–11). Overall, our S-PrediXcan analysis highlighted 315 unique genes linked to F_gRI-D, including significant gene–tissue pairs (such as FOXO3 expression in the anterior cingulate cortex and PPP2R3A expression in the putamen) involving brain regions with known relevance for music processing^42,43.

**Fig. 3: S-PrediXcan and LDSC partitioned heritability results for regulatory brain cell type annotations.**

To investigate the neurobiology of genetic variation shared between rhythm impairment and dyslexia at cell type resolution, we performed LDSC partitioned heritability analysis⁴⁴ using cell type-specific regulatory region annotations of neurons, microglia, astrocytes and oligodendrocytes⁴⁵. We found robust significant SNP heritability enrichments in the promoters of neurons (enrichment (SEM) of 8.14 (1.55), P_FDR = 3.38 × 10⁻⁵), oligodendrocytes (enrichment (SEM) of 7.98 (1.53), P_FDR = 3.38 × 10⁻⁵), astrocytes (enrichment (SEM) of 7.72 (1.59), P_FDR = 1.1 × 10⁻⁴) and microglia (enrichment (SEM) of 4.47 (1.63), P_FDR = 0.04), as well as enhancers of neurons (enrichment (SEM) of 4.43 (0.35), P_FDR = 7.96 × 10⁻¹⁸) and astrocytes (enrichment (SEM) of 2.73 (0.58), P_FDR = 4.35 × 10⁻³) (Fig. 3b and Supplementary Table 12). Consistent with the original rhythm and dyslexia GWAS reports^17,18, F_gRI-D relates to brain structure in part by common effects at regulatory regions within multiple cell types, including neuronal and various non-neuronal cells, such as oligodendrocytes. This may suggest that the F_gRI-D might impact myelination and white matter connectivity patterns that could potentially instantiate neural overlap between rhythm and reading-related aspects of language^1,10,46.

To test the validity of links between rhythm impairment and dyslexia risk and proneness to certain neuropsychiatric disorders proposed by the ARRH, we moved on to investigate relationships of F_gRI-D with psychiatric, neurological and behavioural traits, examining patterns of genetic correlations with common and independent factors in more detail. First, we curated 88 sets of GWAS summary statistics including traits that were significantly genetically correlated either with rhythm or dyslexia in the original GWAS reports^17,18 and three additional education-related traits²⁸ (Supplementary Table 13). To reduce the statistical burden of multiple testing correction in our consequent analyses, we subset this initial set of 88 traits on the basis of their levels of genetic correlation among themselves. To do so, we estimated pairwise genetic correlations, and identified the most highly correlated traits (|r_g| > 0.80; Supplementary Fig. 5). We then performed hierarchical clustering, obtaining one representative trait from each cluster of highly correlated traits (Supplementary Fig. 6). This approach yielded 49 traits that were relatively genetically independent (see Methods for details), for which we estimated the genetic correlations with F_gRI-D, and with the summary statistics obtained by the IPM (Supplementary Fig. 7 and Supplementary Table 14). Genetic correlations between F_gRI-D and the assessed traits ranged from −0.56 to 0.46, and mostly lay between the genetic correlation estimates for independent factors (Supplementary Fig. 7), supporting that F_gRI-D indeed captures the common genetic factor of rhythm impairment and dyslexia. We found significant negative correlations between F_gRI-D and non-word repetition (r_g (s.e.m.) = −0.513 (0.099), P_FDR = 7.03 × 10⁻⁷) and phoneme awareness (r_g (s.e.m.) = −0.562 (0.058), P_FDR = 3.78 × 10⁻²¹), validating the F_gRI-D construct’s link to reading- and language-related traits. Positive genetic correlations were observed for attention deficit hyperactivity disorder (r_g (s.e.m.) = 0.237 (0.029), P_FDR = 3.69 × 10⁻¹⁵), autism spectrum disorder (r_g (s.e.m.) = 0.075 (0.035), P_FDR = 4.529 × 10⁻²) and insomnia (r_g (s.e.m.) = 0.200 (0.027), P_FDR = 6.04 × 10⁻¹³), suggesting shared genetic liability with neuropsychiatric traits that have been phenotypically linked to rhythm⁴⁷. In total, F_gRI-D showed significant (P_FDR < 0.05) genetic correlations with 37 of the 49 selected psychiatric/neurological/behavioural traits with varying magnitudes and directions, including attention deficit hyperactivity disorder, Parkinson’s disease, health satisfaction and loneliness (|r_g| median of 0.146 and range of 0.06–0.56). Consistent with the ARRH, the directionality of genetic correlations suggest that decreased rhythm impairment/dyslexia risk may be associated with resilience to certain neuropsychiatric disorders. These genetic correlations also reflect a shared genomic architecture underlying rhythm, dyslexia and social traits, showing that social function and co-evolution hypotheses of rhythm and communication skills^48,49,50 are plausible from a genetic perspective. Future work will be needed to disentangle possibly shared genomic substrates of the evolution of social interaction, language and music.

Even though reading is a recent human innovation, it recruits language-related brain circuits^51,52, which have undergone biological evolution on the lineage leading to humans. Similarly, dyslexia manifests overtly as a reading or spelling disorder, yet in many cases this reflects underlying deficits in aspects of oral language (for example, phonological awareness)^19,20,21,22. Given this link between spoken language and reading, and in light of theoretical frameworks positing co-evolution of rhythm and language-related skills in humans^{10,48,49,50,53}, we leveraged genomic methods to investigate the evolution of the overlap between rhythm and the reading-related aspect of language over a range of timescales (Fig. 4a). We first performed LDSC partitioned heritability analysis using five evolutionary annotations tagging foetal brain human-gained enhancers⁵⁴, Neanderthal introgressed alleles⁵⁵, archaic deserts⁵⁶ and primate-conserved and accelerated regions⁵⁷ (Fig. 4a). This revealed significant SNP heritability depletions in Neanderthal introgressed alleles, and significant enrichments in primate-conserved regions for all traits (Fig. 4b and Supplementary Table 15), in line with findings for many other complex human traits^58,59. We then used the SBayesS function of the GCTB package⁶⁰ to probe the effect size-minor allele frequency relationship (Ŝ)—an essential component of the complex trait genetic architecture influenced by natural selection⁶⁰. Similar to most cognitive and behavioural traits⁶⁰, we found moderate levels of negative selection acting on F_gRI-D (Ŝ (s.d.) = −0.51 (0.05)) and the independent factors of dyslexia Ŝ (s.d.) = −0.47 (0.06)) and rhythm impairment (Ŝ (s.d.) = −0.49 (0.06)) (Fig. 4d and Supplementary Table 16). Next, we performed MAGMA gene set analysis⁶¹ to investigate links between genes that are co-located with various evolutionary annotations, spanning a timescale from ~8 million to ~35,000 years ago, which were not testable via partitioned heritability analysis owing to low SNP coverage (Methods). Specifically, we tested whether genetic variation associated with F_gRI-D was enriched in genes that overlap with four evolutionary annotations (Supplementary Tables 17–20): (1) ancient selective sweep sites⁶², (2) human accelerated regions^63,64,65,66, (3) differentially methylated regions (DMRs) between anatomically modern humans and archaic humans⁶⁷ and (4) DMRs between anatomically modern humans and chimpanzees⁶⁷. These gene set-based analyses did not yield any significant enrichment signals (Supplementary Table 21), indicating an absence of evidence for associations between F_gRI-D and these four annotations. We then extended our MAGMA gene set enrichment analysis to look for potential links between F_gRI-D and genomic substrates of songbird vocal learning, in line with theoretical predictions¹⁰, and prior associations with the genetic architecture of beat synchronization⁶⁸. To do so, we used nine gene sets that were curated by Gordon et al.⁶⁸ and therein converted to human homologues for the purposes of gene enrichment analyses; each set represents differential gene expression patterns associated with vocal learning phenotypes (for example, song versus silence or number of motifs sung) in Area X and other key regions of zebra finch neural circuitry related to song learning. Intriguingly, we found significant enrichments in four Area X-related gene sets (Supplementary Table 22) using the F_gRI-D summary statistics. These findings may suggest overlapping molecular mechanisms between songbird vocal learning, human rhythm and human language, supporting theories of cross-species convergent evolution of vocal learning and beat perception^10,69.

**Fig. 4: Evolutionary analyses of dyslexia, rhythm impairment, F_gRI-D and independent factors.**

To follow up the significant partitioned SNP heritability enrichments in primate-conserved regions, we investigated the association between F_gRI-D mvGWAS P values and per-SNP primate phastCons scores⁵⁷ for 38,164 clumped SNPs (P < 0.05, r² < 0.06) from F_gRI-D summary statistics (Fig. 4c), and found that one of the F_gRI-D genome-wide significant (P < 5 × 10⁻⁸) hits, rs10891314, had an exceptionally high phastCons score, probably because it is a missense variant (Fig. 4c). We zeroed-in on this genome-wide significant hit as an example locus and dissected patterns of Q_b, and conservation or accelerated evolution in primates (Fig. 4e), confirming the sharp increase in conservation rate for the SNP rs10891314. The Human Genome Dating Atlas⁷⁰ estimates this polymorphism to be 11,199 generations old (95% CI), corresponding to ∼280,000 years ago assuming 25 years per generation, around the time period when the oldest known Homo sapiens fossils have been dated⁷¹. rs10891314 is located in the DLAT gene, which is associated with a rare neurodevelopmental disorder, pyruvate dehydrogenase E2 deficiency. This condition is characterized by neurological dysfunction, dystonia and learning disability mainly appearing during childhood⁷². DLAT is highly conserved and loss-of-function intolerant (probability of loss-of-function intolerance 6.68) (ref. ⁷³), which makes this particular missense variant an interesting candidate for increasing susceptibility to rhythm impairment and dyslexia.

After assessing evolutionary signatures on F_gRI-D at the genome-wide and SNP levels, we extended our investigations of rhythm–language co-evolution by integrating with independent data from neuroimaging genetics. To do so, we estimated local genetic correlations between F_gRI-D and fractional anisotropy (FA) measures of five left hemispheric white matter tracts (Supplementary Fig. 8 and Supplementary Table 23), involved in the dorsal or ventral streams of spoken language, and theorized as key components of rhythm–language convergent evolution^10,69. Using local analysis of (co)variant association (LAVA)⁷⁴, we identified a significant local genetic correlation between F_gRI-D and the left hemispheric superior longitudinal fasciculus (SLF) I (r_g = 1, P_FDR = 0.02) (Supplementary Table 24) on a ~2 Mb region on chromosome 20 (30,569,660–32,484,506), which encompasses several genes including EFCAB8, BAK1P1 and SUN5 (Supplementary Fig. 9). SLF-I is the dorsal division of SLF connecting the superior parietal and superior frontal lobes⁷⁵ (Supplementary Fig. 8). SLF has been shown to have functional links to musical rhythm^76,77, and SLF-I subdivision is involved in the regulation of motor behaviour^78,79. This finding is partially consistent with the hypothesized role of the dorsal stream in supporting co-evolution of phonological processing and beat synchronization⁴.

Discussion

We showed robust genetic correlations among musical rhythm, dyslexia and a number of reading- and language-related traits, providing genetic evidence for the ARRH. Traits, such as non-word repetition, phoneme awareness and having better language skills than mathematics at school, showed significant positive correlations with rhythm skills, which makes these particular traits interesting candidates to study in future genetic studies of shared biology of rhythm and language. Importantly, we also found a significant correlation between rhythm and language resting-state functional connectivity, suggesting shared genetic and neuronal architecture for rhythm and reading-related aspects of language.

The bivariate genomic SEM approach that we developed allowed us to identify genetic overlaps between rhythm impairment and dyslexia and to present a map of homogeneous and heterogeneous genetic effects, shedding light on patterns of pleiotropy between the two¹. Among 18 genome-wide significant loci associated with the common factor of dyslexia and rhythm impairment, the strongest mvGWAS signal comes from a locus tagged by the SNP rs28576629 that is mapped to PPP2R3A, a gene implicated in the negative control of cell growth and division, suggesting a putative role for this gene in dyslexia and rhythm impairment prevalence⁸⁰. Given that we validated our common factor results using two additional mvGWAS methods, we believe that the shared genetic factor that we captured represents a solid first glimpse into the shared genetics of dyslexia and rhythm impairment. Results of this kind might potentially contribute, together with information on other risk factors, towards improved diagnostics of individuals’ propensity for reading- and rhythm-related problems, to enable special educational support. However, given the highly polygenic and environment-dependent nature of behavioural traits, the early risk identification power of our results remains to be explored⁸¹.

Our post-mvGWAS analyses enhance understanding of the aetiology of rhythm and language (on which reading depends) by revealing intricate links across rhythm impairment, dyslexia and various aspects of evolutionary past and neurobiological function, including gene expression in brain tissue, brain cell type-specific gene regulation and a local genetic correlation with a tract linked to regulation of higher aspects of motor behaviour⁷⁵. Our TWAS results validated the association between PPP2R3A and the common factor of dyslexia and rhythm impairment, and narrowed down the overall relevance of this gene for rhythm and reading skills into its expression profile in cerebellum and putamen. We believe that our TWAS results constitute a potentially important gene–tissue pair list to study the links between genetic variants, gene expression in the brain and subsequent effects on neurodevelopment using experimental assays. Our SNP heritability enrichment results in cell type-specific regulatory regions point to multiple brain cell types for follow-up work, without pinning down a specific neuronal or non-neuronal brain cell type, indicating a complex cellular composition in the brain supporting rhythm and language. Heritability enrichment signals in brain-specific regulatory regions provide additional evidence that the dyslexia and rhythm GWASs largely capture the neurodevelopmental aspects of these traits. The genome-wide genetic correlation analysis between the common factor and a set of behavioural traits yielded two important findings. First, the genetic correlation directions and magnitudes point to a link between rhythm impairment and dyslexia risks, and neuropsychiatric disorders, in line with the ARRH. Second, language-related traits, such as non-word repetition and phoneme awareness, yielded the strongest genetic correlations with the common factor of dyslexia and rhythm impairment, further supporting the statistical validity and language or reading relevance of F_gRI-D.

The evolutionary analyses aimed to provide empirical genetic data as groundwork towards understanding potential evolutionary forces acting jointly on human rhythm and language-related skills^53,82. Our significant SNP heritability depletion findings in Neanderthal introgressed alleles is in line with findings for other complex human traits⁵⁸, indicating a reduced contribution of Neanderthal alleles to reading- and rhythm-related skills. Similarly, heritability enrichment findings in primate-conserved regions converge with studies that previously found significant enrichments in these loci for complex disease traits⁵⁹. We also noted a trend for the majority of mvGWAS lead SNPs to have lower primate conservation scores, indicating weaker constraint on these variants in the primate clade, which may have relevance for the evolution of language and rhythm-related traits on the human lineage. This observation lacks statistical confirmation, but would be in line with prior behavioural and neural findings showing a lack of complex musical rhythm detection and synchronization in species, such as macaque monkeys⁸³. Interestingly, one mvGWAS hit locus, mapped to the DLAT gene, stood out as strongly conserved among primates, which makes this gene a potential candidate for future experimental investigations in this area. Our analyses showed significant enrichments of F_gRI-D-associated variants in gene sets curated from transcriptome studies of songbird vocal learning, specifically in a key nucleus of the zebra finch brain, Area X. This link between human genetic variants shaping human reading and rhythm skills, and genes involved in songbird song production (for example, number of motifs sung) in Area X is in line with prior literature⁶⁸, further supporting the importance of shared genetic substrates. Finally, the significant local genetic correlation that we identified between SLF-I and F_gRI-D-associated variants in a ~2 Mb region on chromosome 20 represents an interesting example of potential pleiotropic associations between language- and musical rhythm-related traits and white matter microstructure. This finding is particularly interesting as SLF-I is involved in motor behaviour regulation^78,79, suggesting the presence of shared genetic and neuroanatomical elements between motor aspects of language and musical rhythm. It is also plausible that the shared genetic factors underlying language- and musical rhythm-related skills influence a broader range of cognitive processes rather than being confined to the intersection between language and musical rhythm. Here, we note that future local and genome-wide genetic correlation analyses between F_gRI-D and a larger selection of neuroanatomical traits (for example, anterior and posterior segments of the arcuate fasciculus, inferior fronto-occipital fasciculus) and other imaging modalities are necessary to reveal the shared genetics and neuroanatomy of language- and musical rhythm-related traits.

Our study has several limitations. First, the original dyslexia and rhythm GWASs were performed on European ancestry samples owing to data availability reasons. The lack of large-scale GWASs in non-European populations and the strong sampling bias in large cohorts towards European ancestry individuals hinder a global picture of the genetic architecture of these traits, weaken the replicability of GWAS findings in diverse populations⁸⁴ and limit the interpretability of post-GWAS evolutionary analysis results. This is especially true for behavioural and psychiatric traits that are more prone to be affected by cultural, socioeconomic and environmental factors, which is also reflected in the weaker transferability of polygenic scores across ancestries for such traits⁸⁵. We believe that important steps to solve this will include encouraging and contributing to the generation of large-scale databases with genotype or phenotype data of individuals from diverse genetic ancestries, disentangling the unique gene–environment interactions in other populations, developing GWAS methods and study designs to more carefully take potential confounders into account and improving the reliability of behavioural phenotype measurement techniques. Second, the self-report-based phenotype descriptions in the original GWASs are not ideal measures, but rather represent robust validated proxies that uniquely enable scaling up of data collection to very large cohorts. There is a compromise between the practical convenience of self-report-based data collection from hundreds of thousands of individuals, which is extremely challenging using task-based measurements, and introducing self assessment-related uncertainties to the data. We note that the phenotype in the dyslexia GWAS was not simply self assessment but rather self-report of having received a positive dyslexia diagnosis, and that the genetic architecture was found to be stable across the lifespan, with a genetic correlation of 0.97 between younger (<55 years) and older subgroups (>55 years) of participants¹⁸. Moreover, the construct validity of the rhythm GWAS phenotype is supported by associations between the self-reported and directly measured rhythm skills¹⁷, and polygenic scores trained on the rhythm summary statistics correlate with scores on a rhythm discrimination test^86,87, further supporting the view that genetic signals associated with self-reported beat synchronization ability are an appropriate reflection of rhythm-related skills. Moreover, the genetic correlations between dyslexia, rhythm impairment, F_gRI-D and other speech and language-related phenotypes suggest that the original GWASs largely capture relevant genetic factors. Third, even though potential confounders, such as age and sex, were included as covariates in the dyslexia and rhythm GWAS regression models, we cannot fully exclude residual effects of such factors and other confounders. The original dyslexia GWAS addressed the impacts of age (as noted above) and sex by performing sensitivity analyses using age- and sex-specific GWAS, which showed little effect of either of these two factors on GWAS results¹⁸. Fourth, the majority of genetic factors shaping human language and musical rhythm skills are probably fixed in all human populations. Hence, post-GWAS evolutionary analyses, which leverage present-day variation to probe links between genetic variants and evolutionary annotations, are not ideal to study the origins of traits that probably emerged at earlier periods of hominin evolution. Methodological developments in the complex trait evolution field, and the integration of ancient DNA data from older timepoints of human evolution into polygenic selection analysis methods would greatly help to resolve these challenges in future studies. Finally, relationships between heritability and evolution are quite complex. The individual contributions of common genetic variants to heritability are jointly shaped by selection and demography⁸⁸, which limits the evolutionary interpretation of heritability-based methods.

Despite such constraints, our study represents a step towards characterizing the shared genetic architecture between rhythm- and language-related traits, and provides a valuable analytic pipeline tackling the shared genetics and evolution of rhythm and reading-related aspects of language from various angles. We reveal complex links across common DNA variants, genes, genomic loci, white matter structures and human behaviour, making a first set of links across the immensely long causal chain spanning these layers. Developing and applying more sophisticated methods to dissociate environmental confounds from genetics will allow future studies to obtain a better understanding of the genetics and evolution of human language and musicality.

Methods

GWAS summary statistics

Beat synchronization and dyslexia GWAS summary statistics^17,18 were obtained from 23andMe Inc., a customer genetics company. Both GWASs were performed on European ancestry individuals through online participation. All participants provided informed consent according to 23andMe’s human subject protocol, which is reviewed and approved by the external Association for the Accreditation of Human Research Protection Programs, Inc.-accredited institutional review board, Ethical and Independent Review Services, a private institutional review board (http://www.eandireview.com). The 23AndMe sample prevalence of dyslexia is 4.6% (N_total = 1,138,870, mean age 51 years), and sample prevalence of beat synchronization is 92% (N_total = 606,825, mean age 52 years). Summary statistics files were reformatted and harmonized to include required columns (for example, SNP ID, beta, beta s.e.m. and P value) for each mvGWAS tool following the guidelines in the original publications of each tool. To obtain rhythm impairment summary statistics, effect sizes in the binary beat synchronization GWAS summary statistics were multiplied by −1 so that the effect directions were reversed. This yielded a set of GWAS summary statistics comprising SNP effects contributing to rhythm impairment, which was used for the subsequent mvGWAS analysis with dyslexia. We applied GC correction to both sets of summary statistics for all non-LDSC-based analyses. For LDSC-based analyses (including genomic SEM), uncorrected summary statistics were used as input, as GC correction biases the LDSC SNP heritability estimates downwards. The resulting set of summary statistics from genomic SEM was GC corrected.

SNP heritability and genetic correlation estimations

We used LDSC²⁵ (v1.0.1) to estimate the SNP heritabilities and genetic correlations. For rhythm impairment and dyslexia, we estimated the total SNP heritability on a liability scale using population and sample prevalence information from the original studies (sample prevalence of 0.045 for dyslexia and 0.085 for rhythm impairment, and a population prevalence of 0.050 for dyslexia and 0.048 for rhythm impairment). Genetic correlations were estimated using bivariate LDSC between rhythm, dyslexia, GenLang quantitative reading or language-related traits²⁷, Danish School Grades GWAS²⁸ and all external summary statistics except for the planum temporale asymmetry and the language resting-state functional connectivity, which were assessed as described below.

To estimate genetic correlations between rhythm and planum temporale asymmetry³⁰ and between rhythm and language resting-state functional connectivity²⁹, we used an approach proposed by Naqvi et al.⁸⁹ applicable to unsigned multivariate statistics, as the mvGWAS effect sizes or beta values, which are required to run genetic correlation analysis using LDSC, were not available for these traits. We evaluated the amount of shared signal between each pair of GWASs by estimating the Spearman correlation of the average SNP P values within approximately independent linkage disequilibrium (LD) blocks⁹⁰. We first filtered the genome-wide SNPs using the HapMap3 reference panel without the major histocompatibility complex region (https://github.com/bulik/ldsc). We then split the genome-wide SNPs into 1,703 approximately independent blocks⁹⁰. For each approximately independent LD block, we computed the average SNP −log₁₀(P) value. We then estimated a rank-based Spearman correlation using the averaged association value (n = 1,703) for each LD block. A standard error of the Spearman correlation was estimated using statistical re-sampling with 10,000 bootstrap cycles with replacement from the 1,703 LD blocks.

mvGWAS

To investigate the shared genetic variance of rhythm impairment and dyslexia, we performed mvGWASs using three tools: genomic SEM²⁶, N-weighted GWAMA³³ and CPASSOC³⁴. These tools use GWAS summary-level data and account for genetic correlation and sample overlap using the cross-trait LD score regression intercept.

Genomic SEM (common and independent pathway models)

First, we reformatted our summary statistics for LDSC (munged) and genomic SEM, following standard guidelines (https://github.com/GenomicSEM/GenomicSEM/wiki). We then used the multivariable extension of LDSC to estimate the 2 × 2 empirical genetic covariance matrix between rhythm impairment and dyslexia and their associated sampling covariance matrix. We specified a measurement model (Supplementary Fig. 1), where a shared genetic factor (F_gRI-D) was estimated to capture the observed genetic covariance between rhythm impairment and dyslexia. Given that the number of observed parameters for any 2 × 2 covariance matrix equals 3, we constrained all paths between F_gRI-D to 1. The final CPM was fit to a genetic covariance matrix that incorporates the tested SNP (Supplementary Fig. 1). SNPs were regressed from F_gRI-D, and residuals were freely estimated. The 1,000 Genomes Phase 3 reference panel⁹¹ was used as the reference panel to calculate SNP variance across traits. Effective population size per-GWAS was calculated as 4 × N_cases × (1 − N_cases/N_total). Both the reference panel and effective population sizes were then fed into the sumstats function and summary statistics were prepared for the meta-analysis. We applied genomic correction to the CPM results on the basis of the genomic inflation index estimated by LDSC (λ_GC = 1.62; Supplementary Fig. 2). The final IPM, was fit to the same matrices incorporating the SNP effects, but with the SNP effect being directly regressed from the traits. The final bivariate heterogeneity score, Q_b, was obtained by subtracting by a chi-squared difference test, where the chi-squared of the IPM is subtracted from the chi-squared of the CPM (Q_b = χ²_CPM − χ²_IPM) (ref. ³¹). A high Q_b value indicates that the association between the SNP and rhythm impairment or dyslexia is not well accounted for by the factor F_g. We then used the intersect function of bedtools (v. 2.29.2)⁹² to identify the overlaps between genome-wide significant (P < 5 × 10⁻⁸) Q_b (Supplementary Table 3) and CPM loci (Supplementary Table 2), as well as ±1 Mb surroundings of each CPM locus.

CPASSOC

Following the CPASSOC manual³⁴, we used the median sample size for each summary statistics file, as 23andMe SNPs can have varying sample sizes. We removed SNPs with a Z-score larger than 1.96 or less than −1.96, and extracted a 2 × 2 genetic correlation matrix for dyslexia and rhythm impairment. Then we generated an M × K matrix of summary statistics where each row represented a SNP, and two columns represented dyslexia and rhythm impairment Z-scores. We finally performed the S_hom test, and obtained a vector of P values for M SNPs using the pchisq function in R (4.0.3).

GWAMA (N-weighted)

To account for sample overlap, we first generated a matrix of cross-trait intercepts using the intercepts of LDSC genetic correlations between dyslexia and rhythm impairment summary statistics. We then performed N-weighted GWAMA by feeding the cross-trait intercept matrix and a vector of SNP heritabilities of each trait using the multivariate_GWAMA function.

TWAS

We conducted a TWAS using S-PrediXcan framework³⁷ and the joint-tissue imputation (JTI) TWAS derived models from GTEx v8 tissues³⁵. PrediXcan predicts gene expression from the genotype profile of each individual by using the JTI model weights, which were trained on GTEx⁹³, and validated on PsychEncode⁹⁴ and GEUVADIS⁹⁵. These SNP expression weights represent the correlations between SNPs and gene expression levels. To overcome the requirement for individual-level genotype data, Barbeira et al.³⁷ derived a mathematical expression, implemented in S-PrediXcan framework, which effectively yields similar outcomes to PrediXcan using GWAS summary statistics. S-PrediXcan and JTI weights account for LD and collinearity problems owing to high expression correlation across tissues³⁵. We filtered the 17q21.31 inversion region (~1.5 Mb long), which has multiple phenotypic associations with brain-related traits⁹⁶ to minimize the impact of this high-LD region on our results. We then corrected TWAS P values for 192,905 gene–tissue pairs, and used Z-scores and P_FDR of the significant (P_FDR < 0.05) pairs to assess gene–F_gRI-D associations.

Gene set enrichment and pathway analyses

We used PANTHER to run statistical overrepresentation analysis in three GO and three PANTHER GO-Slim terms (biological process, molecular function and cellular component)^38,39,40,41 with 315 unique genes that we obtained from TWAS. We used 20,102 genes that we tested in TWAS as the background gene set. Results were FDR corrected for all GO and GO-Slim terms (n = 15,028).

LDSC partitioned heritability with cell type-specific annotations

We used eight human genome annotations by Nott et al.⁴⁵ tagging promoter and enhancer regions of neurons, oligodendrocytes, microglia, and astrocytes using LDSC partitioned heritability analysis⁴⁴ following the guidelines in the LDSC Wiki page (https://github.com/bulik/ldsc/wiki/Partitioned-Heritability). All enrichment analyses were controlled for the baselineLD model v2.2. Enrichment P value results were FDR corrected for eight tests.

Genetic correlations using GWAS summary statistics from neuropsychiatric or behavioural phenotypes

We first compiled 88 traits that were significantly genetically correlated either with rhythm (impairment) or dyslexia in the original respective GWAS papers^17,18. We filtered these traits to avoid unnecessary multiple testing burden and to focus on genetically independent phenotypes. We identified 46 traits that are more than ±80% genetically correlated with at least one other trait. Next, we created a distance matrix from the correlation estimates and performed hierarchical clustering using Ward’s method⁹⁷ as the linkage method, which maximizes the within-cluster homogeneity to identify trait clusters. We identified seven clusters using the so-called elbow method, and chose the most informative and representative trait for each cluster on the basis of the highest correlation between traits and the cluster principal component. We added these seven cluster-representative traits to the remaining 42 traits and used LDSC to estimate genetic correlations with F_gRI-D and two independent factors. Genetic correlation P values were FDR corrected for 49 tests.

Partitioned heritability analysis with custom evolutionary annotations

We used LDSC²⁵ (v1.0.1) to estimate partitioned SNP heritability enrichments or depletions in foetal brain human-gained enhancers⁵⁴, Neanderthal introgressed alleles⁵⁵, archaic deserts⁵⁶, conserved loci in the primate phylogeny (Conserved_Primate_phastCons46way annotation from baselineLD) and genomic loci that have a primate phyloP score⁵⁷ less than −2 (presumably suggesting accelerated evolution). All annotations were controlled for baselineLD model v2.2. Foetal brain human-gained enhancers were also controlled for foetal brain active regulatory elements from the Roadmap Epigenomics Consortium database⁹⁸.

MAGMA gene set analysis with custom evolutionary and songbird vocal learning gene lists

We compiled four additional evolutionary genomic annotations for MAGMA gene set analysis⁶¹, which cover timescales from ~8 million years ago to ~35,000 years ago: ancient selective sweeps⁶², human accelerated regions^63,64,65,66, anatomically modern human-derived DMRs⁶⁷ and human versus chimpanzee DMRs⁶⁷. These annotations either tag regulatory or selective sweep sites, and cover less than 1% of the total number of well-imputed SNPs in 1,000 Genomes Phase 3 reference panel, which makes them unsuitable for LDSC partitioned heritability analysis. We listed the genes that fall within ±1 kilobase of each locus tagged by each annotation, and filtered these initial gene lists for protein-coding genes using the National Center for Biotechnology Information’s (NCBI) hg19 genome annotation⁹⁹. The resulting protein-coding gene lists were used for MAGMA gene set enrichment analysis for F_gRI-D summary statistics. We first performed gene annotation by integrating SNP locations from the summary statistics, and gene locations from NCBI hg19 genome annotations. We then performed a gene analysis using SNP P values and 1,000 Genomes Phase 3 European panel⁹¹. We finally applied a gene set analysis using results from gene annotation and gene analysis, and four gene-sets. Enrichment P values were FDR corrected for four tests. In the second part of our MAGMA analysis, we used nine additional songbird brain-expressed gene sets from Gordon et al.⁶⁸ and performed a separate gene set enrichment analysis for F_gRI-D. Enrichment P values were FDR corrected for nine tests.

Genome-wide negative selection estimation

We performed SBayesS analysis on the rhythm impairment, dyslexia, F_gRI-D, and two independent factor GWAS summary statistics using the GCTB software (version 2.02)⁶⁰ to quantify the level of negative selection acting on these traits. SBayesS estimates total SNP heritability, polygenicity and the relationship between variants’ minor allele frequencies and effect sizes, and generates a genome-wide negative selection metric (S), which ranges from 0 to −1. S estimates that are closer to −1 are interpreted as a sign of strong negative selection⁵⁰, whereas estimates closer to 0 can suggest positive selection⁶⁰.

LAVA local genetic correlations with white matter connectivity measures

To identify local regions of the genome that might make shared contributions to rhythm, language and evolutionarily relevant brain circuitry, we tested local genetic correlations between F_gRI-D and white matter connectivity measures. We performed GWASs of selected brain imaging traits using data from the UK Biobank¹⁰⁰. For these GWASs, UK Biobank data first underwent sample and genetic quality control and brain imaging data processing, followed by genome-wide association analysis.

Sample quality control

This study used the UK Biobank February 2020 release (research application number: 79683). All participants provided informed consent, and the study was approved by the North West Multi-Centre Research Ethics Committee. For individuals with both diffusion-weighted magnetic resonance imaging (MRI) and genotyping data, we excluded participants with unusual heterozygosity (principal components corrected heterozygosity >0.19), high missingness (missing rate >0.05), sex mismatches between genetically inferred sex and self-reported sex as reported by Bycroft et al.¹⁰⁰. We further restricted our analyses to individuals with white British ancestry as defined by Bycroft et al.¹⁰⁰ to avoid any possible confounding effects related to ancestry. This resulted in 31,465 individuals (mean age of 55.21 years old, range 40–70 years old, 16,497 females) passing the sample quality control.

Genetic quality control

The imputed genotypes were obtained from the UK Biobank portal. These data underwent a stringent quality control protocol. We excluded SNPs with minor allele frequencies below 1%, Hardy–Weinberg P value below 1 × 10⁻⁷ or imputation quality INFO scores below 0.8. Multi-allelic variants that cannot be handled by many programs used in genetic-related analyses were removed. This resulted in 9,422,496 autosomal SNPs that were analysed in the GWAS.

Neuroimaging phenotypes

The diffusion-weighted MRI data were acquired from a 3-T Siemens Skyra scanner using the following parameters: isotropic voxel size (resolution) of 2 × 2 × 2 mm, five non-diffusion-weighted images (b = 0 s mm⁻²), diffusion-weighting of b = 1,000 and 2,000 s mm⁻² with 50 directions each, and acquisition time of 7 min. Whole-brain diffusion-weighted MRI scans were acquired in vivo, and fed into diffusion tensor imaging (DTI) fitting toolbox to assess brain microstructure. This analysis created the DTI outputs, including FA quantitative diffusion maps. Next, the DTI FA images were fed into the tract-based spatial statistics analysis¹⁰¹, resulting in the skeletonized images. Details of the image acquisition, quality control and processing are described elsewhere (refer to https://biobank.ctsu.ox.ac.uk/crystal/crystal/docs/brain_mri.pdf for the full protocol)¹⁰². We did not use the imaging-derived phenotypes released by the UK Biobank. Instead, we averaged the skeletonized images of five standard-space tract masks defined by Rojkova et al.¹⁰³ by following a processing protocol similar to the ones applied by the UK Biobank and the ENIGMA teams (http://enigma.ini.usc.edu/protocols/dti-protocols). Five left hemisphere white matter tracts that we investigated here are the long segment of the arcuate fasciculus; the SLF subdivisions I, II and III; and the uncinate fasciculus.

Genome-wide association scanning

GWASs were performed separately for each of the neuroimaging phenotypes using imputed genotyping data, with PLINK (v1.9)¹⁰⁴. We made use of categorical and continuous variables controlling for covariates in the GWASs including age, sex, genotype array type and assessment centre. To avoid possible confounding effects related to ancestry, we used the first ten genetic principal components capturing population genetic diversity. These covariates are considered in a pre-residualization step: a multiple linear regression of the endophenotype vector on the covariates is performed, and they are replaced by their corresponding residual. Additionally, a rank-based inverse normalization is performed to ensure that the distributions of endophenotypes are normally distributed.

Local genetic correlations

We identified a list of overlapping loci using 2,495 LD blocks covering the whole human genome provided in the LAVA⁷³ partitioning algorithm GitHub repository (https://github.com/cadeleeuw/lava-partitioning) and 1,609 genome-wide significant (P < 5 × 10⁻⁸) SNPs in our F_gRI-D summary statistics. This resulted in 18 LD blocks. We then used LAVA to estimate local genetic correlations between F_gRI-D and the five aforementioned white matter tracts. LAVA estimates local heritability for each of these 18 LD blocks, and for each considered trait. For the loci that explained a significant proportion (nominally significant SNP heritability estimate, P < 0.05) of the total SNP heritability of F_gRI-D and white matter tracts, we proceeded to perform bivariate local genetic correlation. This extra step of filtering on the basis of local SNP heritability estimates is not mandatory but recommended⁷³. Finally, we obtained local genetic correlation estimates and associated P values, which we FDR corrected for 14 tests.

Box 1 types of rhythm impairment phenotypes

Rhythm impairment, which is also described as atypical rhythm, refers to impaired (significantly less accurate) performance on a musical rhythm perception or production task (for example, rhythm and interval discrimination, rhythm or metre processing, beat perception and synchronization and isochronous motor timing or tapping)¹⁶. It is a broad construct that covers time-based amusia and beat deafness, inaccurate beat synchronization, related impairments in sensitivity to rhythmic patterns and metre of music and inconsistent motor timing^105,106. The population prevalence of rhythm impairments is estimated to be between 3.0% and 6.5% (ref. ¹⁷). The ARRH claims that rhythm impairment is comorbid with developmental language and speech disorders. While the underlying mechanisms of rhythm impairment are largely unknown, ARRH suggests a shared neurobiological and genetic ground for these traits¹⁶.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The full GWAS summary statistics from the original 23andMe discovery studies set have been made available through 23andMe to qualified researchers under an agreement with 23andMe that protects the privacy of the 23andMe participants. Datasets will be made available at no cost for academic use. Please visit https://research.23andme.com/collaborate/#dataset-access/ for more information and to apply to access the data. Participants provided informed consent and volunteered to participate in the research online, under a protocol approved by the external AAHRPP-accredited insttutional review board, Ethical and Independent Review Services. As of 2022, Ethical and Independent Review Services is part of Salus institutional review board (https://www.versiticlinicaltrials.org/salusirb). The primary neuroimaging genetics data used in this study are available via the UK Biobank website www.ukbiobank.ac.uk. The GWAS summary statistics of FA measures of five white matter tracts, which were derived from the UK Biobank brain imaging dataset, are publicly available at the MPI Archive (accession link: https://hdl.handle.net/1839/d99a85d0-537f-46a2-af19-ee5310311ec8). Genome annotation of the human genome assembly (hg19) was downloaded from the NCBI database (https://www.ncbi.nlm.nih.gov/datasets/gene/GCF_000001405.25/). Source data are provided with this paper.

Code availability

All scripts used for analyses are publicly available via the GitHub repository at https://github.com/galagoz/pleiotropyevo. This study used openly available software, specifically PLINK (http://zzz.bwh.harvard.edu/plink/) and S-PrediXcan (https://github.com/hakyimlab/MetaXcan). JTI-TWAS prediction models trained on GTEx v8 are available at the PredictDB website (http://predictdb.org and https://github.com/gamazonlab/MR-JTI/tree/master). The human frontal lobe probabilistic atlas used is available at https://storage.googleapis.com/bcblabweb/open_data.html.

Change history

23 September 2025
In the version of this article originally published, the legends of Supplementary Tables 17, 18 and 19 in the Source Data file were mixed up. The correct table legends are:
Table S17: DMRs between AMHs and archaic humans gene-set used for MAGMA gene-set enrichment analysis.
Table S18: Ancient Selective Sweeps gene-set used for MAGMA gene-set enrichment analysis.
Table S19: Human Accelerated Regions gene-set used for MAGMA gene-set enrichment analysis.
The legends have now been corrected in the Source Data file. We note that this error was limited to the labelling of the Supplementary tables, and does not affect the datasets, analyses, interpretation, or conclusions of the article, which remain unchanged.

References

Nayak, S. et al. The musical abilities, pleiotropy, language, and environment (MAPLE) framework for understanding musicality–language links across the lifespan. Neurobiol. Lang. 3, 615–664 (2022).
Article Google Scholar
Politimou, N., Dalla Bella, S., Farrugia, N. & Franco, F. Born to speak and sing: musical predictors of language development in pre-schoolers. Front. Psychol. 10, 948 (2019).
Article PubMed PubMed Central Google Scholar
Dalla Bella, S. et al. BAASTA: battery for the assessment of auditory sensorimotor and timing abilities. Behav. Res. 49, 1128–1145 (2017).
Article Google Scholar
Tarar, J. M., Meisinger, E. B. & Dickens, R. H. Test review: test of word reading efficiency–second edition (TOWRE-2) by Torgesen, J. K., Wagner, R. K. & Rashotte, C. A. Can. J. Sch. Psychol. 30, 320–326 (2015).
Article Google Scholar
Lundetræ, K. & Thomson, J. M. Rhythm production at school entry as a predictor of poor reading and spelling at the end of first grade. Read. Writ. 31, 215–237 (2018).
Article PubMed Google Scholar
Nayak, S. et al. Musical rhythm abilities and risk for developmental speech-language problems and disorders: epidemiological and polygenic associations. Preprint at PsyArXiv https://doi.org/10.31234/osf.io/kcgp5 (2024).
Mehr, S. A. et al. Universality and diversity in human song. Science 366, eaax0868 (2019).
Article PubMed PubMed Central CAS Google Scholar
Savage, P. E. et al. Music as a coevolved system for social bonding. Behav. Brain Sci. 44, e59 (2021).
Article Google Scholar
Tierney, A. & Kraus, N. Auditory–motor entrainment and phonological skills: precise auditory timing hypothesis (PATH). Front. Hum. Neurosci. 8, 949 (2014).
Article PubMed PubMed Central Google Scholar
Patel, A. D. Vocal learning as a preadaptation for the evolution of human beat perception and synchronization. Philos. Trans. R. Soc. B 376, 20200326 (2021).
Article CAS Google Scholar
Dehaene, S. & Cohen, L. Cultural recycling of cortical maps. Neuron 56, 384–398 (2007).
Article PubMed CAS Google Scholar
Anderson, M. L. Evolution of cognitive function via redeployment of brain areas. Neuroscientist 13, 13–21 (2007).
Article PubMed Google Scholar
Asano, R., Boeckx, C. & Fujita, K. Moving beyond domain-specific versus domain-general options in cognitive neuroscience. Cortex 154, 259–268 (2022).
Article PubMed Google Scholar
Zentner, M. & Eerola, T. Rhythmic engagement with music in infancy. Proc. Natl Acad. Sci. USA 107, 5768–5773 (2010).
Article PubMed PubMed Central CAS Google Scholar
Merchant, H., Grahn, J., Trainor, L., Rohrmeier, M. & Fitch, W. T. Finding the beat: a neural perspective across humans and non-human primates. Philos. Trans. R. Soc. B 370, 20140093 (2015).
Article Google Scholar
Ladányi, E., Persici, V., Fiveash, A., Tillmann, B. & Gordon, R. L. Is atypical rhythm a risk factor for developmental speech and language disorders? Wiley Interdiscip. Rev. Cogn. Sci. 11, e1528 (2020).
Article PubMed PubMed Central Google Scholar
Niarchou, M. et al. Genome-wide association study of musical beat synchronization demonstrates high polygenicity. Nat. Hum. Behav. 6, 1292–1309 (2022).
Article PubMed PubMed Central Google Scholar
Doust, C. et al. Discovery of 42 genome-wide significant loci associated with dyslexia. Nat. Genet. 54, 1621–1629 (2022).
Article PubMed PubMed Central CAS Google Scholar
Carroll, J. M. & Snowling, M. J. Language and phonological skills in children at high risk of reading difficulties. J. Child Psychol. Psychiatry 45, 631–640 (2004).
Article PubMed Google Scholar
Margari, L. et al. Neuropsychopathological comorbidities in learning disorders. BMC Neurol. 13, 198 (2013).
Article PubMed PubMed Central Google Scholar
McArthur, G. M., Hogben, J. H., Edwards, V. T., Heath, S. M. & Mengler, E. D. On the ‘specifics’ of specific reading disability and specific language impairment. J. Child Psychol. Psychiatry 41, 869–874 (2000).
Article PubMed CAS Google Scholar
Catts, H. W., Fey, M. E., Tomblin, J. B. & Zhang, X. A longitudinal investigation of reading outcomes in children with language impairments. J. Speech Lang. Hear. Res. 45, 1142–1157 (2002).
Article PubMed Google Scholar
Savage, P. E., Brown, S., Sakai, E. & Currie, T. E. Statistical universals reveal the structures and functions of human music. Proc. Natl Acad. Sci. USA 112, 8987–8992 (2015).
Article PubMed PubMed Central CAS Google Scholar
Jacoby, N. & McDermott, J. H. Integer ratio priors on musical rhythm revealed cross-culturally by iterated reproduction. Curr. Biol. 27, 359–370 (2017).
Article PubMed CAS Google Scholar
Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Article PubMed PubMed Central CAS Google Scholar
Grotzinger, A. D. et al. Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat. Hum. Behav. 3, 513–525 (2019).
Article PubMed PubMed Central Google Scholar
Eising, E. et al. Genome-wide analyses of individual differences in quantitatively assessed reading- and language-related skills in up to 34,000 people. Proc. Natl Acad. Sci. USA 119, e2202764119 (2022).
Article PubMed PubMed Central CAS Google Scholar
Rajagopal, V. M. et al. Genome-wide association study of school grades identifies genetic overlap between language ability, psychopathology and creativity. Sci. Rep. 13, 429 (2023).
Article PubMed PubMed Central CAS Google Scholar
Mekki, Y. et al. The genetic architecture of language functional connectivity. Neuroimage 249, 118795 (2022).
Article PubMed Google Scholar
Carrion-Castillo, A. et al. Genetic effects on planum temporale asymmetry and their limited relevance to neurodevelopmental disorders, intelligence or educational attainment. Cortex 124, 137–153 (2020).
Article PubMed Google Scholar
Grotzinger, A. D. et al. Genetic architecture of 11 major psychiatric disorders at biobehavioral, functional genomic and molecular genetic levels of analysis. Nat. Genet. 54, 548–559 (2022).
Article PubMed PubMed Central CAS Google Scholar
Hendrix, P. et al. Structure and expression of a 72-kDa regulatory subunit of protein phosphatase 2A. Evidence for different size forms produced by alternative splicing. J. Biol. Chem. 268, 15267–15276 (1993).
Article PubMed CAS Google Scholar
Mägi, R. & Morris, A. P. GWAMA: software for genome-wide association meta-analysis. BMC Bioinform. 11, 288 (2010).
Article Google Scholar
Li, X. & Zhu, X. in Statistical Human Genetics: Methods and Protocols (ed. Elston, R. C.) (Springer, 2017).
Gamazon, E. & Zhou, D. JTI. Zenodo https://doi.org/10.5281/zenodo.3842289 (2020).
Zhou, D. et al. A unified framework for joint-tissue transcriptome-wide association and Mendelian randomization analysis. Nat. Genet. 52, 1239–1246 (2020).
Article PubMed PubMed Central CAS Google Scholar
Barbeira, A. N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 9, 1825 (2018).
Article PubMed PubMed Central Google Scholar
Mi, H. et al. Protocol update for large-scale genome and gene function analysis with the PANTHER classification system (v.14.0). Nat. Protoc. 14, 703–721 (2019).
Article PubMed PubMed Central CAS Google Scholar
Thomas, P. D. et al. PANTHER: making genome-scale phylogenetics accessible to all. Protein Sci. 31, 8–22 (2022).
Article PubMed CAS Google Scholar
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
Article PubMed PubMed Central CAS Google Scholar
The Gene Ontology Consortium The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Res. 49, D325–D334 (2021).
Kasdan, A. V. et al. Identifying a brain network for musical rhythm: a functional neuroimaging meta-analysis and systematic review. Neurosci. Biobehav. Rev. 136, 104588 (2022).
Article PubMed PubMed Central Google Scholar
Nandi, B. et al. Musical training facilitates exogenous temporal attention via delta phase entrainment within a sensorimotor network. J. Neurosci. 43, 3365–3378 (2023).
Article PubMed PubMed Central CAS Google Scholar
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Article PubMed PubMed Central CAS Google Scholar
Nott, A. et al. Brain cell type-specific enhancer–promoter interactome maps and disease-risk association. Science 366, 1134–1139 (2019).
Article PubMed PubMed Central CAS Google Scholar
Fitch, W. T. & Martins, M. D. Hierarchical processing in music, language, and action: Lashley revisited. Ann. N. Y. Acad. Sci. 1316, 87–104 (2014).
Article PubMed PubMed Central Google Scholar
Lense, M. D., Ladányi, E., Rabinowitch, T.-C., Trainor, L. & Gordon, R. Rhythm and timing as vulnerabilities in neurodevelopmental disorders. Philos. Trans. R. Soc. B 376, 20200327 (2021).
Article Google Scholar
Killin, A. The origins of music: evidence, theory, and prospects. Music Sci. 1, 205920431775197 (2018).
Article Google Scholar
Patel, A. D. in The Science–Music Borderlands (eds Margulis, E. H. et al.) (MIT, 2023); https://doi.org/10.7551/mitpress/14186.003.0006
Patel, A. D. The evolutionary biology of musical rhythm: was Darwin wrong? PLoS Biol. 12, e1001821 (2014).
Article PubMed PubMed Central Google Scholar
Vandermosten, M., Hoeft, F. & Norton, E. S. Integrating MRI brain imaging studies of pre-reading children with current theories of developmental dyslexia: a review and quantitative meta-analysis. Curr. Opin. Behav. Sci. 10, 155–161 (2016).
Article PubMed PubMed Central Google Scholar
Wandell, B. A. & Le, R. K. Diagnosing the neural circuitry of reading. Neuron 96, 298–311 (2017).
Article PubMed CAS Google Scholar
Mehr, S. A., Krasnow, M. M., Bryant, G. A. & Hagen, E. H. Origins of music in credible signaling. Behav. Brain Sci. 44, e60 (2021).
Article Google Scholar
Reilly, S. K. et al. Evolutionary changes in promoter and enhancer activity during human corticogenesis. Science 347, 1155–1159 (2015).
Article PubMed PubMed Central CAS Google Scholar
Vernot, B. & Akey, J. M. Resurrecting surviving neandertal lineages from modern human genomes. Science 343, 1017–1021 (2014).
Article PubMed CAS Google Scholar
Vernot, B. et al. Excavating Neandertal and Denisovan DNA from the genomes of Melanesian individuals. Science 352, 235–239 (2016).
Article PubMed PubMed Central CAS Google Scholar
Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).
Article PubMed PubMed Central CAS Google Scholar
McArthur, E., Rinker, D. C. & Capra, J. A. Quantifying the contribution of Neanderthal introgression to the heritability of complex traits. Nat. Commun. 12, 4481 (2021).
Article PubMed PubMed Central CAS Google Scholar
Sullivan, P. F. et al. Leveraging base-pair mammalian constraint to understand genetic variation and human disease. Science 380, eabn2937 (2023).
Article PubMed PubMed Central CAS Google Scholar
Zeng, J. et al. Widespread signatures of natural selection across human complex traits and functional genomic categories. Nat. Commun. 12, 1164 (2021).
Article PubMed PubMed Central CAS Google Scholar
Leeuw, C. A., de, Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).
Article PubMed PubMed Central Google Scholar
Peyrégne, S., Boyle, M. J., Dannemann, M. & Prüfer, K. Detecting ancient positive selection in humans using extended lineage sorting. Genome Res. 27, 1563–1572 (2017).
Article PubMed PubMed Central Google Scholar
Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482 (2011).
Article PubMed PubMed Central CAS Google Scholar
Pollard, K. S. et al. An RNA gene expressed during cortical development evolved rapidly in humans. Nature 443, 167–172 (2006).
Article PubMed CAS Google Scholar
Prabhakar, S., Noonan, J. P., Pääbo, S. & Rubin, E. M. Accelerated evolution of conserved noncoding sequences in humans. Science 314, 786 (2006).
Article PubMed CAS Google Scholar
Bird, C. P. et al. Fast-evolving noncoding sequences in the human genome. Genome Biol. 8, R118 (2007).
Article PubMed PubMed Central Google Scholar
Gokhman, D. et al. Differential DNA methylation of vocal and facial anatomy genes in modern humans. Nat. Commun. 11, 1189 (2020).
Article PubMed PubMed Central CAS Google Scholar
Gordon, R. L. et al. Linking the genomic signatures of human beat synchronization and learned song in birds. Philos. Trans. R. Soc. B 376, 20200329 (2021).
Article CAS Google Scholar
Kotz, S. A., Ravignani, A. & Fitch, W. T. The evolution of rhythm processing. Trends Cogn. Sci. 22, 896–910 (2018).
Article PubMed CAS Google Scholar
Albers, P. K. & McVean, G. Dating genomic variants and shared ancestry in population-scale sequencing data. PLoS Biol. 18, e3000586 (2020).
Article PubMed PubMed Central Google Scholar
Hublin, J.-J. et al. New fossils from Jebel Irhoud, Morocco and the pan-African origin of Homo sapiens. Nature 546, 289–292 (2017).
Article PubMed CAS Google Scholar
Head, R. A. et al. Clinical and genetic spectrum of pyruvate dehydrogenase deficiency: dihydrolipoamide acetyltransferase (E2) deficiency. Ann. Neurol. 58, 234–241 (2005).
Article PubMed CAS Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Article PubMed PubMed Central CAS Google Scholar
Werme, J., van der Sluis, S., Posthuma, D. & de Leeuw, C. A. An integrated framework for local genetic correlation analysis. Nat. Genet. 54, 274–282 (2022).
Article PubMed CAS Google Scholar
Janelle, F., Iorio-Morin, C., D’amour, S. & Fortin, D. Superior longitudinal fasciculus: a review of the anatomical descriptions with functional correlates. Front. Neurol. 13, 794618 (2022).
Article PubMed PubMed Central Google Scholar
Blecher, T., Tal, I. & Ben-Shachar, M. White matter microstructural properties correlate with sensorimotor synchronization abilities. Neuroimage 138, 1–12 (2016).
Article PubMed Google Scholar
Vaquero, L., Ramos-Escobar, N., François, C., Penhune, V. & Rodríguez-Fornells, A. White-matter structural connectivity predicts short-term melody and rhythm learning in non-musicians. Neuroimage 181, 252–262 (2018).
Article PubMed Google Scholar
Catani, M. & Thiebaut de Schotten, M. Atlas of Human Brain Connections (Oxford Univ. Press, 2012).
Makris, N. et al. Segmentation of subcomponents within the superior longitudinal fascicle in humans: a quantitative, in vivo, DT-MRI study. Cereb. Cortex 15, 854–869 (2005).
Article PubMed Google Scholar
Dzulko, M., Pons, M., Henke, A., Schneider, G. & Krämer, O. H. The PP2A subunit PR130 is a key regulator of cell development and oncogenic transformation. Biochim. Biophys. Acta Rev. Cancer 1874, 188453 (2020).
Article PubMed CAS Google Scholar
Wesseldijk, L. W. et al. Notes from Beethoven’s genome. Curr. Biol. 34, R233–R234 (2024).
Article PubMed PubMed Central CAS Google Scholar
Fitch, W. T. Empirical approaches to the study of language evolution. Psychon. Bull. Rev. 24, 3–33 (2017).
Article PubMed Google Scholar
Honing, H., Merchant, H., Háden, G. P., Prado, L. & Bartolo, R. Rhesus monkeys (Macaca mulatta) detect rhythmic groups in music, but not the beat. PLoS ONE 7, e51369 (2012).
Article PubMed PubMed Central CAS Google Scholar
Abdellaoui, A., Yengo, L., Verweij, K. J. H. & Visscher, P. M. 15 years of GWAS discovery: realizing the promise. Am. J. Hum. Genet. 110, 179–194 (2023).
Article PubMed PubMed Central CAS Google Scholar
Non, A. L. & Cerdeña, J. P. Considerations, caveats, and suggestions for the use of polygenic scores for social and behavioral traits. Behav. Genet. 54, 34–41 (2024).
Article PubMed Google Scholar
Wesseldijk, L. W., Abdellaoui, A., Gordon, R. L., Ullén, F. & Mosing, M. A. Using a polygenic score in a family design to understand genetic influences on musicality. Sci. Rep. 12, 14658 (2022).
Article PubMed PubMed Central CAS Google Scholar
Gustavson, D. E. et al. Heritability of childhood music engagement and associations with language and executive function: insights from the adolescent brain cognitive development (ABCD) study. Behav. Genet 53, 189–207 (2023).
Article PubMed PubMed Central Google Scholar
Sella, G. & Barton, N. H. Thinking about the evolution of complex traits in the era of genome-wide association studies. Annu. Rev. Genom. Hum. Genet. 20, 461–493 (2019).
Article CAS Google Scholar
Naqvi, S. et al. Shared heritability of human face and brain shape. Nat. Genet. 53, 830–839 (2021).
Article PubMed PubMed Central CAS Google Scholar
Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285 (2016).
Article PubMed CAS Google Scholar
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article PubMed Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article PubMed PubMed Central CAS Google Scholar
Lonsdale, J. et al. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Article CAS Google Scholar
Wang, D. et al. Comprehensive functional genomic resource and integrative model for the human brain. Science 362, eaat8464 (2018).
Article PubMed PubMed Central CAS Google Scholar
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
Article PubMed PubMed Central CAS Google Scholar
Campoy, E., Puig, M., Yakymenko, I., Lerga-Jaso, J. & Cáceres, M. Genomic architecture and functional effects of potential human inversion supergenes. Philos. Trans. R. Soc. B 377, 20210209 (2022).
Article CAS Google Scholar
Ward, J. H. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963).
Article Google Scholar
Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012).
Article PubMed PubMed Central CAS Google Scholar
Church, D. M. et al. Modernizing reference genome assemblies. PLoS Biol. 9, e1001091 (2011).
Article PubMed PubMed Central CAS Google Scholar
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Article PubMed PubMed Central CAS Google Scholar
Smith, S. M. et al. Tract-based spatial statistics: voxelwise analysis of multi-subject diffusion data. NeuroImage 31, 1487–1505 (2006).
Article PubMed Google Scholar
Alfaro-Almagro, F. et al. Image processing and quality control for the first 10,000 brain imaging datasets from UK Biobank. Neuroimage 166, 400–424 (2018).
Article PubMed Google Scholar
Rojkova, K. et al. Atlasing the frontal lobe connections and their variability due to age and education: a spherical deconvolution tractography study. Brain Struct. Funct. 221, 1751–1766 (2016).
Article PubMed CAS Google Scholar
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience https://doi.org/10.1186/s13742-015-0047-8 (2015).
Peretz, I. & Vuvan, D. T. Prevalence of congenital amusia. Eur. J. Hum. Genet. 25, 625–630 (2017).
Article PubMed PubMed Central Google Scholar
Sowiński, J. & Dalla Bella, S. Poor synchronization to the beat may result from deficient auditory–motor mapping. Neuropsychologia 51, 1952–1963 (2013).
Article PubMed Google Scholar

Download references

Acknowledgements

This project was supported in part by funding from the National Institute on Deafness and Other Communication Disorders, the Office of Behavioural and Social Sciences Research, and the Office of the Director of the National Institutes of Health under Award Numbers R01DC016977, K18DC017383 and DP2HD098859. G.A., E.E., G.B. and S.E.F. are supported by the Max Planck Society. G.B. is also supported by the German Federal Ministry of Education and Research. The funders had no role in study design, data collection and analysis, the decision to publish or the preparation of the manuscript. S.E.F. is a member of the Centre for Academic Research and Training in Anthropogeny. This research was conducted using the UK Biobank resource under application no. 79683. We thank the research participants and employees of 23andMe for making this work possible. The contact for the 23andMe Research Team can be reached at joyce@23andme.com.

Funding

Open access funding provided by Max Planck Society.

Author information

These authors contributed equally: Simon E. Fisher, Reyna L. Gordon.

Authors and Affiliations

Language and Genetics Department, Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
Gökberk Alagöz, Else Eising, Giacomo Bignardi & Simon E. Fisher
Department of Otolaryngology—Head and Neck Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
Yasmina Mekki & Reyna L. Gordon
Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
Yasmina Mekki, Nancy J. Cox & Reyna L. Gordon
Max Planck School of Cognition, Leipzig, Germany
Giacomo Bignardi
23andMe, Inc., Sunnyvale, CA, USA
Pierre Fontanillas, Stella Aslibekyan, Adam Auton, Elizabeth Babalola, Robert K. Bell, Jessica Bielenberg, Jonathan Bowes, Katarzyna Bryc, Ninad S. Chaudhary, Daniella Coker, Sayantan Das, Emily DelloRusso, Sarah L. Elson, Nicholas Eriksson, Teresa Filshtein, Will Freyman, Zach Fuller, Chris German, Julie M. Granka, Karl Heilbron, Alejandro Hernandez, Barry Hicks, David A. Hinds, Ethan M. Jewett, Yunxuan Jiang, Katelyn Kukar, Alan Kwong, Yanyu Liang, Keng-Han Lin, Bianca A. Llamas, Matthew H. McIntyre, Steven J. Micheletti, Meghan E. Moreno, Priyanka Nandakumar, Dominique T. Nguyen, Jared O’Connell, Aaron A. Petrakovitz, G. David Poznik, Alexandra Reynoso, Shubham Saini, Morgan Schumacher, Leah Selcer, Anjali J. Shastri, Janie F. Shelton, Jingchunzi Shi, Suyash Shringarpure, Qiaojuan Jane Su, Susana A. Tat, Vinh Tran, Joyce Y. Tung, Xin Wang, Wei Wang, Catherine H. Weldon, Peter Wilton & Corinna D. Wong
Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
Michel G. Nivard
MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK
Michel G. Nivard
Department of Psychology, University of Edinburgh, Edinburgh, UK
Michelle Luciano
Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands
Simon E. Fisher
Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN, USA
Reyna L. Gordon
Department of Hearing & Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
Reyna L. Gordon

Authors

Gökberk Alagöz
View author publications
Search author on:PubMed Google Scholar
Else Eising
View author publications
Search author on:PubMed Google Scholar
Yasmina Mekki
View author publications
Search author on:PubMed Google Scholar
Giacomo Bignardi
View author publications
Search author on:PubMed Google Scholar
Pierre Fontanillas
View author publications
Search author on:PubMed Google Scholar
Michel G. Nivard
View author publications
Search author on:PubMed Google Scholar
Michelle Luciano
View author publications
Search author on:PubMed Google Scholar
Nancy J. Cox
View author publications
Search author on:PubMed Google Scholar
Simon E. Fisher
View author publications
Search author on:PubMed Google Scholar
Reyna L. Gordon
View author publications
Search author on:PubMed Google Scholar

Consortia

23andMe Research Team

Stella Aslibekyan
, Adam Auton
, Elizabeth Babalola
, Robert K. Bell
, Jessica Bielenberg
, Jonathan Bowes
, Katarzyna Bryc
, Ninad S. Chaudhary
, Daniella Coker
, Sayantan Das
, Emily DelloRusso
, Sarah L. Elson
, Nicholas Eriksson
, Teresa Filshtein
, Pierre Fontanillas
, Will Freyman
, Zach Fuller
, Chris German
, Julie M. Granka
, Karl Heilbron
, Alejandro Hernandez
, Barry Hicks
, David A. Hinds
, Ethan M. Jewett
, Yunxuan Jiang
, Katelyn Kukar
, Alan Kwong
, Yanyu Liang
, Keng-Han Lin
, Bianca A. Llamas
, Matthew H. McIntyre
, Steven J. Micheletti
, Meghan E. Moreno
, Priyanka Nandakumar
, Dominique T. Nguyen
, Jared O’Connell
, Aaron A. Petrakovitz
, G. David Poznik
, Alexandra Reynoso
, Shubham Saini
, Morgan Schumacher
, Leah Selcer
, Anjali J. Shastri
, Janie F. Shelton
, Jingchunzi Shi
, Suyash Shringarpure
, Qiaojuan Jane Su
, Susana A. Tat
, Vinh Tran
, Joyce Y. Tung
, Xin Wang
, Wei Wang
, Catherine H. Weldon
, Peter Wilton
& Corinna D. Wong

Contributions

G.A., E.E., N.J.C., R.L.G. and S.E.F. designed the research. G.A., E.E., Y.M. and G.B. performed the research. G.A., E.E. and Y.M. analysed the data. G.A. wrote the initial draft of the paper. E.E., Y.M., G.B., P.F., M.G.N., M.L., R.L.G. and S.E.F. provided critical feedback and commented on the paper.

Corresponding authors

Correspondence to Gökberk Alagöz, Simon E. Fisher or Reyna L. Gordon.

Ethics declarations

Competing interests

P.F. is employed by and holds stock or stock options in 23andMe, Inc. The other authors declare no competing interests.

Peer review

Peer review information

Nature Human Behaviour thanks Päivi Onkamo and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–9.

Reporting Summary

Peer Review File

Source data

Source Data Extended Data Fig. 1/Table 1.

Supplementary Tables 1–24. Statistical analysis results and source data.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Alagöz, G., Eising, E., Mekki, Y. et al. The shared genetic architecture and evolution of human language and musical rhythm. Nat Hum Behav 9, 376–390 (2025). https://doi.org/10.1038/s41562-024-02051-y

Download citation

Received: 02 October 2023
Accepted: 07 October 2024
Published: 21 November 2024
Version of record: 21 November 2024
Issue date: February 2025
DOI: https://doi.org/10.1038/s41562-024-02051-y

This article is cited by

Preschool musicality is associated with school-age communication abilities through genes related to rhythmicity
- Lucía de Hoyos
- Ellen Verhoef
- Beate St Pourcain
npj Science of Learning (2025)
Musical rhythm abilities and risk for developmental speech-language problems and disorders: epidemiological and polygenic associations
- Srishti Nayak
- Enikő Ladányi
- Reyna L. Gordon
Nature Communications (2025)