|Home | About | Journals | Submit | Contact Us | Français|
A role of non-Mendelian inheritance in genetics of complex, age-related traits is becoming increasingly recognized. Recently, we reported on two inheritable clusters of SNPs in extensive genome-wide linkage disequilibrium (LD) in the Framingham Heart Study (FHS), which were associated with the phenotype of premature death. Here we address biologically-related properties of these two clusters. These clusters have been unlikely selected randomly because they are functionally and structurally different from matched sets of randomly selected SNPs. For example, SNPs in LD from each cluster are highly significantly enriched in genes (p=7.1×10−22 and p=5.8×10−18), in general, and in short genes (p=1.4×10−47 and p=4.6×10−7), in particular. Mapping of SNPs in LD to genes resulted in two, partly overlapping, networks of 1764 and 4806 genes. Both these networks were gene enriched in developmental processes and in biological processes tightly linked with development including biological adhesion, cellular component organization, locomotion, localization, signaling, (p<10−4, q<10−4 for each category). Thorough analysis suggests connections of these genetic networks with different stages of embryogenesis and highlights biological interlink of specific processes enriched for genes from these networks. The results suggest that coordinated action of biological processes during embryogenesis may generate genome-wide networks of genetic variants, which may influence complex age-related phenotypes characterizing health span and lifespan.
Despite advance of genetic association studies, the optimism in the field is tempered because of so-called missing heritability problem, i.e., that proportions of phenotypic variance explained by genes detected in prior studies are much smaller than heritability estimates of those traits. This problem should be paralleled with two fundamental results of genome-wide association studies (GWAS). First, complex traits are likely influenced by a large number of alleles from genes spread through the entire genome. Second, individually these alleles can confer tiny risks of a complex trait. These results imply that in order to explain clinically meaningful genetic susceptibility for an individual, his/ her genome should include large number of the risk alleles. This is a major challenge.
Indeed, given clustering of complex human traits in families, heritability of the traits has to be explained in terms of transmittance of the risk alleles from parents to offspring. Originally, GWAS followed so-called common disease – common variant (CDCV) hypothesis assuming that a common trait could be influenced by a few common variants in the person’s genome. Then transmittance of the risk alleles from parents to offspring follows Mendelians’ logic. GWAS provided evidences that common traits are likely influenced by a large number of variants spread throughout the genome that strengthens Biometricians’ logic. This logic poses theoretical challenges in terms of transmittance of the risk alleles to offspring. Fisher in his classical work  advocated a quantitative mechanism for explaining inheritance of polygenic traits. A key of this mechanism is the concept of allelic equivalence, i.e., those different alleles rather confer risk of, than cause, the same trait. Then, common traits are considered as non-Mendelian while the risk alleles are inherited following random (i.e., Mendelian) segregation.
If complex traits are influenced by a large number of risk alleles present in the genome of a given person, then, fundamental question is whether it is indeed always the case that these risk alleles are transmitted from parents to offspring randomly? A number of recent studies have documented co-adaptation of physically unlinked genes at loci on different chromosomes participating in the same biological processes (BP) in humans [2,3] and in mammals [4–6]. There are also increasing evidences that inter-chromosomal linkage may be much more common than previously believed [7–11]. Co-adaptation of alleles on the basis of biological mechanisms implies functional linkage. Although the mechanism of functional linkage is unclear yet, it can be associated with regulatory mechanisms controlling transcription of genes located on different autosomes [12,13]. Functionally linked alleles are phenotypically manifested and can be transmitted to offspring as a complex. This transmittance was observed in various studies [3,14–16]. These observations suggest that we can expect that the risk alleles from the person’s genome could be transmitted to offspring non-randomly.
Theoretical basis of random mechanism of inheritance is the Mendel’s Second Law which is believed to be proven by a number of cytological experiments. Much less expected was the fact that the very first cytological evidence was for non-random segregation of autosomes during meiosis . Since that time non-random segregation was observed in various species [18–22]. Despite that, these evidences are not in mainstream of most genetic studies . These experimental evidences, however, may represent phenotypic manifestation of functional linkage among physically unlinked genes, implying that alleles in such genes can function as a complex and be transmitted to offspring together.
Genome-wide functional linkage of alleles implies that there should be fundamental bio-molecular mechanisms generating and supporting functional co-adaptation. A relevant mechanism could be associated with developmental processes  which affect the entire organism. Barker’s hypothesis of developmental plasticity  also provides a link of the developmental processes with phenotypes characteristic for the late life including geriatric diseases and premature death.
Recently, we reported about inheritable genome-wide networks of SNPs in chromosome-wide and inter-chromosomal linkage disequilibrium (LD) which have been associated with premature death and major human diseases in the Framingham Heart Study (FHS) [26–28]. The analyses in [26–28] suggested that these networks were unlikely the matter of chance or technical artefacts. In this work we provide further evidence on non-random origin of the networks of SNPs in genome-wide LD. We also show that genes for these genome-wide SNP networks comprise a complex system of interlinked BPs which is essential for embryogenesis.
The FHS includes 14,428 participants comprising three cohorts, i.e., the FHS original (launched in 1948; 5,209 respondents), the FHS Offspring (FHSO, launched in 1971–1975; 5,124 offspring), and the 3rd Generation (launched in 2002; 4,095 grandchildren) cohorts. The FHS is a population-based longitudinal study following its participants for up to about 60 years. Selection criteria, study design and phenotypic data have been previously described [29–31]. SNP data are available for 9,274 participants for whom DNA samples have been drawn . Genotyping in the FHS was done using the Affymetrix 500K (250K Nsp and 250K Sty) and an independent 50K Human Gene Focused arrays having no overlapping SNPs.
In all analyses we used autosomal SNPs from the Affymetrix 500K and 50K arrays which passed quality control (QC) tests. QC was as in [26–28], i.e., SNPs/individuals were excluded if: call rate < 90%, Hardy-Weinberg equilibrium p-values <10−2, and Mendel errors>2%. The analyses in [26–28] identified two partly overlapping sets of SNPs in extensive intra- and inter-chromosomal LD among those showing associations with cardiovascular diseases, cancer, and premature death. This LD was observed among SNPs on two independent Affymetrix 500K and 50K arrays. In  we identified complexes of four (Y complex) and five (G complex) SNPs on Affymetrix 50K array which showed perfect intra-set LD (r2>0.8) and no inter-set LD. We evaluated LD of these nine SNPs with the other SNPs on both arrays (after QC) using plink . We used empirically established in  cut off of r2=0.02 to define non-random LD and selected SNPs with minor allele frequency (MAF) 0.009≤MAF≤0.499. The largest sets of SNPs in both nominal LD (r2≥0.02) and moderately-strong LD (r2≥0.50) were identified for rs1390694 from the Y complex and for rs9330200 from the G complex. The Y set included 3970 SNPs and the G set included 20254 SNPs called as the reference SNP sets hereafter.
All SNPs were annotated and mapped to genes using the US FDA’s Array Track . Due to large number of genes for the reference SNP sets intergenic SNPs were not mapped to genes; annotation, thus, resulted in 1764 (Y set) and 4806 (G set) genes. We analyzed enrichment of genes in biological process Gene Ontology (GO) terms [35,36] in the Y and G sets separately. To ensure robustness of the results, three GO-based bioinformatics tools were used. For both the Y and G sets we used GOFFA  and high-throughput GoMiner . Web-based tool DAVID  was aided in the analyses of the Y set alone because DAVID was limited to 3000 genes.
These tools evaluate enrichment of genes in GO terms relative to a list of all genes in the genome. Optionally, GoMiner also uses client supplied gene set. In our analysis we used both methods, i.e., GO database gene set (called “auto-generated” set) and set of genes for qualified SNPs on the arrays. The DAVID augments and integrates annotations from different databases .
All three tools provide Fisher’s exact test p-values. Nominal level of significance according to this test (i.e., p≤0.05) was used to pre-select GO terms. GoMiner and DAVID also provided false discovery rate (FDR) q-values which aid in offsetting the multiple comparisons problem in the analyses on enrichment of genes in specific GO terms [40,41]. As a rule, cut off for q-value at 0.1 at maximal number of randomizations (1000 in GoMiner) was used. Less stringent q-values were used in the analyses of enrichment of genes in small BPs. Although large high-level terms are more reliable  and small GO terms can be more readily enriched by chance, they are more specific and may be crucial for gaining insights into biological essence and for revealing interlink between the underlying BPs . Interlink between such processes warrants using less stringent p- and q-values for the corresponding small GO terms because biologically-motivated interlink implies smaller chances of false findings.
The analyses using the bioinformatics tools were complemented by rigorous systemic analysis  of selected GO terms using manually curated information from relevant literature and public databases as advocated in [42,44]. Because GOFFA did not provide q-values, we validated our findings by contrasting them by 20 random gene sets (generated, using the set of genes for qualified SNPs on the 500K plus 50K arrays) for each reference gene set, i.e., by calculating and comparing enrichment statistics for GO terms in the reference and random gene sets (having the same number of genes as the reference sets). Given potential biases in the analyses of GO terms (e.g., gene length bias, large gene list size, gene enrichment in BP in genome, GO’s structure [36,45]), we also verified in some analyses whether or not a complex of biologically interlinked GO terms enriched in the reference SNP sets is also enriched in 20 control random SNP sets (having the same number of SNPs as the reference sets and generated, using the qualified SNPs on the 500K plus 50K arrays).
BPs was prioritized according to the hierarchical structure of the GO tree of terms using GOFFA and the EBI’s QuickGO tools (http://www.ebi.ac.uk/QuickGO/).
Two reference sets of 1764 (Y set) and 4806 genes (G set) were linked to BP GO terms using GoMiner, GOFFA, and DAVID. Comparative analyses of the results using all three tools mostly show enrichment of genes in the same BPs regardless of the tool (see Supplementary Information, Table S1). Sensitivity analysis of the choice of the reference gene list was conducted using auto-generated (GO database) and an array-based gene lists in GoMiner for the Y and G sets. No qualitative differences were observed implying that genes were present on the arrays mostly in direct proportion to genes involved in BP in the genome (see Supplementary Information, Table S1 for the Y set). Accordingly, we discuss the results of gene enrichment analysis for the Y and G sets based on the auto-generated gene list in GoMiner (see Supplementary Information, Tables S2 and S3).
Genome-wide LD implies coherent behavior of SNPs seen as statistical clustering; this property is not typical for traditional GWAS. This property can be efficiently used to ensure that random origin of the observed LD is implausible. We prove this by providing three evidences highlighting biologically-motivated functional and structural differences among the two reference sets of SNPs in LD and the randomly selected SNP sets.
First, mapping the reference SNPs to genes resulted in 2318 gene entries (including repeats) and 1764 unique genes in the Y set and 10026 gene entries and 4806 unique genes in the G set. Annotation of the same number of SNPs to genes in the 20 control random sets for each reference set shows that genes in the random sets are significantly underrepresented compared to the reference sets. Highly significant enrichment in the Y and G reference sets is observed for both unique genes (Figure 1) and all gene entries (Supplementary Information, Figure S1). This finding ensures functional difference between the random SNP sets and the sets of SNPs in LD.
Second, because genes with large number of SNPs have higher chances to be selected by chance, we evaluated the difference in the number of SNPs in genes (that is a proxy for gene length) identified in the reference and random sets. Contrary to the expectation by chance, this analysis shows significant enrichment of shorter genes in the Y and G reference sets compared to the random sets. Specifically, mean number of SNPs in genes identified in the Y (G) set was 28.3 (25.8) whereas mean number of SNPs in the Y-(G-) related random sets was 49.2 (29.4). This difference was highly significant with p=1.4×10−47 for the Y set and p=4.6×10−7 for the G set (Figure 2). This finding provides another robust support for functional difference between the random and the reference SNP sets. Significantly larger number of shorter genes in the reference sets partly explains larger number of genes in these sets.
Third, SNPs from the Y and G sets were associated with the same phenotype of premature death, although of different severity , whereas SNPs from the random sets were generated regardless of their connection (except by chance) with any health- and/or lifespan-related phenotype. Accordingly, if the associations of SNPs from the Y and G sets with premature death are plausible, then SNPs from these sets should overlap non-randomly. We found that 1541 SNPs overlap between the Y (N=3970) and G (N=20254) reference sets. Analysis of pairs of SNP sets from the Y- and G-related control random sets shows that stochastic overlap between them is on average of only 385 SNPs. The difference in these overlaps of SNPs is highly significant, p=8.6×10−35 (Figure 3). This result ensures that the overlap of SNPs from the Y and G sets is unlikely by chance. It highlights structural difference between the reference and random SNP sets.
These evidences, along with those on enrichment of non-synonymous coding SNPs in the reference sets , suggest that genes for SNPs in genome-wide LD selected against the phenotype of premature death  should have relevant biological functions to work in coordinate fashion. Accordingly, it is imperative to gain profound insights on this role.
The most significant (p<10−4) enrichment of genes from the Y and G sets in general GO categories was observed in developmental process terms and in a number of other complex, biologically important processes that drive many aspects of development (Figure 4). The full list of enriched BPs is given in Supplementary Information, Tables S2 and S3. These tables show significant enrichment of genes from the reference sets in higher-level GO terms as well as in a variety of lower-level categories, partly overlapping through GO hierarchy. Larger size of the G set results in longer list of significant BPs compared to the Y set at different hierarchical levels. Importantly, higher-level categories related to growth (e.g., developmental cell growth) and cell cycle (e.g., cell cycle process, mitotic cell cycle, cell cycle phase) mostly characterized the G set, i.e., they were neither significant in the Y set nor in the majority of the G-set-related control random SNP sets.
Although high-level GO terms per se provide limited information for biological interpretation of the results (e.g., due to GO structure [35,36] and biases of pathway analyses [35,44]), enrichment of genes in the aforementioned categories suggests that specifics of the Y and G sets is an aggregation genes in BPs essential for development (Figure 4). This general architecture of biological specifics of the reference sets is supported by enrichment of genes in a variety of biologically interrelated specific processes which are characteristic for these sets but, as a rule, not for the random sets (selected list of such processes is shown in Table 1). Next sections provide the results of detail biological analyses of developmental processes enriched in the Y and G sets (see “Developmental processes”) and interlinks between the selected Y set specific developmental and signaling processes (see “Functional crosstalk: the Wnt signaling”). The analyses of other major processes sketched in Figure 4 are detailed in Supplementary Information, Text “Biological analyses”. These analyses highlight a systemic structure of the Y and G sets which is lacking in the control random SNP sets.
General GO term developmental process included a variety of significant sub-categories, typically, in strong hierarchic sequence. Highly significant enrichment of genes from the Y and G sets (p<10−4) was observed in large BPs related to organismal development, morphogenesis, and cell differentiation. BPs related to developmental growth were the G set-specific (e.g., developmental growth involved in morphogenesis, pG=0.008 vs. pY=0.358).
Morphogenesis, differentiation and growth are BPs controlling fundamental aspects of the developmental program. Accordingly, enrichment of genes in these processes suggests that the Y and G sets are likely linked to the program of embryonic development. This is also supported by enrichment of genes in more specific large GO term in utero embryonic development (pY=0.024 and pG=0.0009) and in a number of specific BPs related to gastrulation.
Gastrulation, an early stage of embryogenesis  determining the basic body plan, includes the Y set specific terms placenta development (pY=0.032), branching involved in labyrinthine layer morphogenesis (pY=0.021), gastrulation with mouth forming second (pY=0.026), formation of primary germ layer (pY=0.053). The development of the labyrinthine layer is among the earliest steps of placenta formation  and pattering along the primary anterior–posterior (A-P) axis  occurs first in development . These specific processes suggest relevance of the Y set to earlier stage of the embryogenesis.
Paraxial mesoderm formation (pY=0.013) specifies mesoderm formation in the Y set whereas axial mesoderm morphogenesis (pG=0.016) is the G set-specific term. The unsegmented cranial paraxial mesoderm particularly contributes to the neurocranium (the part of the skull that protects the brain) . Accordingly, small process embryonic neurocranium morphogenesis (pY=0.023) characterizes the Y set. The paraxial mesoderm (but not the axial) has been suggested as a source of neural crest (neural crest cell development (pY=0.071 vs. pG=0.296)) inductive signals [50,51]. The cranial paraxial mesoderm together with cranial neural crest cells from the hindbrain migrate into developing pharyngeal arches [51,52]. The latter is supported by gene enrichment in pharyngeal system development (pY=0.003).
Also, the neural plate, the central nervous system (CNS) primordium, can be patterned by signals from the paraxial mesoderm [53,54]. Specific terms neural plate regionalization (pY=0.007) and neural plate anterior/posterior regionalization (pY=0.003) are neither significant in the G set nor in each control random set; they logically complement the discussed above earlier developmental processes characteristic for the Y set. Regionalization of neural plate is the first step in neural patterning with establishment of the initial polarity of the neural plate along the A-P axis.
The development of the nervous systems is one of the most important processes in embryo that allows to even pursue so-called neural default model . Our results show enrichment of genes involved in nervous system development, neurogenesis (all pY,G<10−4), and central nervous system development (pY<10−4 and pG=0.002) along with variety of related sub-categories (Supplementary Information, Tables S2 and S3).
Brain development (pY=0.003 and pG=0.028) in both sets is mostly represented by sub-categories of forebrain development which are more pronounced in the G set. Cerebral cortex development (pG=0.0003 vs. pY=0.014) is specified in the G set by child term superior temporal gyrus development (pG=0.001). The superior temporal gyrus contains the primary auditory cortex . The G set genes are enriched in this process in parallel with enrichment in other terms linked to developmental processes of the auditory pathway (e.g., inner ear receptor cell differentiation, auditory receptor cell differentiation; see “Sensory organ development”). Further, the superior temporal gyrus is a part of brain network which includes the orbitofrontal cortex [57,58] that is also represented by orbitofrontal cortex development (pG=0.014). The orbitofrontal cortex integrates the auditory, visual and somatosensory inputs, participates in learning, prediction and decision making, and controls behaviors [59,60]. Again, this is in line with gene enrichment in large terms learning or memory, learning, locomotory behavior, visual behavior, startle response, and sensory perception of mechanical stimulus in the G set (Supplementary Information, Table S3).
Development of the hindbrain, the second of the three primary regions of brain, is specified by child terms cell migration in hindbrain (pY=0.006) and cerebellum development (pY=0.025) along with cerebellum morphogenesis and cerebellar cortex morphogenesis in the Y set. The hindbrain, a key source of patterning information in the developing head , controls basic life-supporting functions (e.g., breathing and heart rate) and reflexes. Also, the motor innervation is supplied by the hindbrain’s neurons . Activity of motor neuron directly leads to muscle contraction. The cerebellum coordinates and controls motor activity (e.g., body movement, balance, coordination) through the cerebro-cerebellar network . As discussed below, other processes associated with motor control and activity, e.g., motor axon guidance (see “Neurogenesis”), neuromuscular process controlling balance (see “Sensory organ development), that are early events of prenatal development [64,65], are consistently specific for the Y set.
Enrichment of genes from the Y and G sets in neurogenesis and the lower-level sub-categories in hierarchic sequence (Figure 5) is primarily related to axonal outgrowth during neuron development. Genes are also enriched in central nervous system neuron axonogenesis (pY=0.003 and pG=0.005) and other related terms.
Directed axon outgrowth is an essential process for the formation of neuronal network during embryogenesis . Axon guidance in both sets is specified by child terms dorsal/ventral axon guidance (pY=0.023 and pG=0.014) and retinal ganglion cell axon guidance (pY=0.008 and pG=0.003) which are paralleled with axial patterning and development of the optic nerve containing the ganglion cell axons running to the brain (see “Sensory organ development”).
The Y set is characterized by a small term motor axon guidance (pY=0.0005 vs. pG=0.187) that is in line with a number of the Y set-specific terms discussed in “Gastrulation”. The hindbrain, or cranial motor neurons , project their axons to the skeletal (striated) muscles which, in turn, are derived from paraxial mesoderm . The face and neck muscles developing from pharyngeal arches are striated muscles as well. Skeletal muscle organ development (pY=0.038), skeletal muscle tissue development (pY=0.037) and more specific process myotube differentiation (pY=0.024) characterize the Y set along with the other related terms observed in both sets (e.g., muscle structure development, muscle cell differentiation, myoblast fusion, and neuromuscular junction development) (Supplementary Information, Tables S2 and S3). A skeletal muscle fiber formed by the fusion of myoblasts during a developmental stage is myotube. Motor axons contact myotubes in skeletal muscles [68,69]. Besides, genes are enriched in the Y set-specific child terms astrocyte development and astrocyte activation (pY=0.003). The astrocytes are essential for axon guidance, synapse formation, neuronal homeostasis and myelination; they are also implicated in cerebellar development and motor control . Thus, this system of specialized processes (see also Table 1) in the Y set highlights specific stage related to earlier development of motor control system.
Interestingly, leading SNP rs1390694 in the Y set (see “Methods”) is located in FGGY gene which is highly expressed in fetal brain and is implicated in neuron homeostasis. FGGY may confer susceptibility to sporadic amyotrophic lateral sclerosis (ALS) [71,72]. ALS is the most common motor neuron disease with astrocyte-mediated toxicity on motor neurons . Also, the involvement of the cerebellum in ALS is suggested  that again is in line with development of this part of brain and basic motor control system specifically in the Y set. Moreover, demyelination may be a secondary symptom for ALS patients arriving after muscle atrophy [75,76]. FGGY encodes a protein that phosphorylates carbohydrates such as ribulose, ribitol and L-arabinitol. Specifically, ribitol is an essential component of the riboflavin (vitamin B-2) that is important for fetal development and implicated in the riboflavin/free radical-induced axonal degeneration and in demyelination as well [77,78]. Accordingly, more complex interplay of ALS determinants can be expected.
After neuronal differentiation axons extend towards their target cells. Axon extension (pG=0.009) is significant in the G set in agreement with gene enrichment in the related term developmental cell growth (pG=0.008).
Axons and dendrites are two parts of nerve cells involved in the conduction of nerve impulses in CNS . Axons extend rapidly and begin to develop first. Dendrite-related terms (e.g., dendrite development (pG=0.019 vs. pY=0.13) are specific for the G set. Neuron recognition (pG=0.008) is necessary in determining synaptic connectivity to generate early behaviors [80, 81], though, the first functional networks may develop without recognition of “correct” target neurons .
Like other developmental processes, generation of the nervous system occurs in stages. Because terms dendrite development and neuron recognition along with behavior, learning, and startle response are not enriched in the Y set, this set likely specifies the earlier stage of the development. Importantly, the lack of enrichment of these processes is contrasted by their enrichment (at p<0.05) in majority (>18) of control Y-set-related random SNP sets that is largely due to gene length bias.
The development of normal brain organization requires input via all the major sensory systems . Development of the sensory organs is characterized in both sets by specific terms related to the development of the olfactory bulb and nose, e.g., olfactory bulb interneuron development (pY=0.002 and pG=0.001), chemorepulsion involved in embryonic olfactory bulb interneuron migration (pY=0.0007 and pG=0.014), nose development (pY=0.018), the eyes, e.g., optic nerve morphogenesis (pY=0.043 and pG=0.004) and the inner ear, e.g., inner ear receptor cell differentiation (pY=0.011 and pG=0.002), inner ear receptor stereocilium organization (pY=0.019), auditory receptor cell differentiation (pG=0.013).
The olfactory system is one of the earliest sensory systems to develop in the human embryo . The optic nerve  is derived from the retinal ganglion cell and contains the ganglion cell axons running to the brain (see “Neurogenesis”). The inner ear contains the sensory organs for balance and hearing . The hearing function is paralleled with neurological system process sensory perception of sound (pY=0.001 and pG=0.002) in both sets. The inner ear’s balance function is in line with cerebellum development (cerebellum controls balance) and neuromuscular process controlling balance (pY=0.018) in the Y set.
The sensory organs, such the nose, the ear, and the eye appear very early in development as specialized surface regions (placode) . Enrichment of the Y set genes in olfactory placode formation (pY=0.043 vs. pG=0.671) and optic placode formation (pY=0.023 vs. pG=0.566) supports connection of the Y set with earlier developmental stage of these sensory organs. Meanwhile, progress in the development of the inner ear and the eyes is in line with gene enrichment in processes related to development of the superior temporal gyrus and the orbitofrontal cortex in the G set as well as in the related neurophysiological processes and behaviors (see “Brain development”). This connection supports more pronounced role of the G set at later stage of development of the sensory pathways. This functional difference between the Y and G sets suggests that embryonic induction of the inner ear, particularly, may be paralleled with development of the responding organs and tissue and be at different stages controlled by distinct sets of genes.
Thus, the Y set emphasizes the earlier processes of embryogenesis related to the formation of the head and the establishment of the network of neuromuscular connection in head region.
The development of the nervous system is paralleled with the development of the CV system and the CV system is the first major system to function in the human embryo. Among processes related to earlier events of cardiogenesis and development of blood vessels characteristic for the Y set, we observe cardiac atrium development (pY=0.031) along with venous blood vessel morphogenesis (pY=0.023).
Formation of the de novo blood vessels (vasculogenesis) begins from endothelial cell precursors (angioblasts). Endothelial progenitor cells from paraxial mesoderm, that are characteristic for the Y set, have angioblastic capacity . Endothelial cell differentiation (pY=0.035) together with endothelium development (pY=0.041) and regulation of vasoconstriction (pY=0.049) are characteristic for the Y set as well. In addition, specific process lymphangiogenesis (pY=0.021) characterizes early development of the lymph vessels . Lymphatic vessels arise via transdifferentiation of venous endothelial cells. The endothelial cells line vessels and regulate the flow of body fluid. The blood and lymphatic vascular systems function in concert that is characterized by regulation of body fluid levels (pY=0.026 and pG=0.0008).
BPs aorta smooth muscle tissue morphogenesis (pY=0.008) and vein smooth muscle contraction (pY=0.045) along with smooth muscle cell proliferation (pG=0.009) and smooth muscle tissue development (pG=0.005) are plausible in connection with developmental processes of circulatory system because involuntary smooth muscles form the muscle layers in the walls of blood vessels .
Genes from the G set, in contrast to the Y set, are predominantly enriched in cardiovascular system development (pG<10−4 vs. pY=0.038) and, specifically, in heart development (pG<10−4 vs. pY=0.028), blood vessel development (pG=0.008 vs. pY=0.118) and sprouting angiogenesis (pG=0.007). The latter is a specific process crucial for the development of new blood vessels from pre-existing blood vessels . Heart development is paralleled with cardiac muscle fiber development (pG=0.034) in the G set. Gene enrichment in these BPs supports connection of the G set with later stage of embryogenesis when development and functioning of the CV system are in progress.
Highly significant higher-level term anatomical structure development (pY,G<10−4) is specified by a substantially wider array of subcategories in the G set than in the Y set. This differentiation is logical given potential link of the G set with later stage of embryogenesis and with some initial developmental processes. In particular, enrichment of genes in head development (pG=0.004) and face morphogenesis (pG=0.006) is paralleled with prevalence of BPs related to the forebrain (see “Brain development”) that acts as a structural support for facial development  as well as the axial mesoderm (see “Gastrulation”) that comprises the head process . The development of complex asymmetric organs is represented in the G set not only by heart development but also by liver development (pG=0.027) along with hepaticobiliary system development (pG=0.012). The lungs (respiratory system development (pG=0.049)) are markers of left-right (L-R) asymmetry as well . Renal system development (pG=0.046) also requires both the L-R and A-P patterning signals to determine the kidney field .
Female sex differentiation (pG=0.007) along with development of primary female sexual characteristics, female gonad development, and apoptosis involved in luteolysis represent developmental process involved in reproduction.
Mesenchymal cell development (pG=0.043), morphogenesis of an epithelium (pG=0.023), and columnar/cuboidal epithelial cell differentiation (pG=0.012) characterize progress in development of the connective and the epithelial tissues in addition to the development of two other basic tissue types (neuronal and muscular). The mammary gland  and the salivary gland  develop as a result of mesenchymal-epithelial interactions. Accordingly, specific processes salivary gland development together with child term dichotomous subdivision of terminal units involved in salivary gland branching (pG=0.0008) and mammary gland development together with mammary gland dust branching involved in pregnancy (pG=0.034) characterize gland morphogenesis (pG=0.009). Epithelio-mesenchymal interactions are also essential for series of other events  including G-set-specific GO terms (e.g., cell-cell junction organization, cell-matrix adhesion ; see “Cellular component organization” and “Biological adhesion” in Supplementary Information, Text “Biological analyses”).
Thus, the systems of the interlinked developmental BPs enriched in the Y and G sets suggest involvement of genes from these sets in regulation of processes characteristic for embryonic development. The Y set of genes appears to be more characteristic for the early stages whereas the G set of genes is more characteristic for the later stage. Importantly, epigenetic processes, that are crucial for normal early development in mammals , genetic imprinting and regulation of gene expression by genetic imprinting (pY=0.003, qY=0.05 and pG=0.029, qG=0.21), are characteristic for both reference sets in contrast to the control random sets (also, see Supplementary Information, Text “Biological analyses”).
As an example, next section details the results of biological analyses of interlink between the selected Y set specific developmental and signaling processes.
Wnt signaling plays a critical role in the embryonic development and the maintenance of adult tissues [99,100]. Wnt receptor signaling pathway is more pronounced in the Y set (pY=0.005 vs. pG=0.064). It is specified by regulation of Wnt receptor signaling pathway (pY=0.009 and pG=0.023) along with child terms negative regulation of Wnt receptor signaling pathway (pY=0.016). Gene enrichment in the Wnt signaling is consistent with enrichment of a number of the Y set-specific interlinked processes in which Wnt signaling is implicated, including (among others) the early placental differentiation , the development of neural crest  and pharyngeal system , the paraxial mesoderm formation , specification of skeletal muscle cells , and cranial skeleton development . The key role of Wnt family proteins in A-P patterning has been confirmed [48, 107] including establishing the oral-aboral axis  and the initial A-P regionalization of the neural plate . Wnt signaling in growth cone is a general and evolutionary conserved mechanism of axon guidance . Specifically, Wnts mediate A-P axon guidance of motor neurons [110, 111]. Also, the Wnt signaling plays an essential role in establishment of the embryonic cerebellum , in formation of synapse  and neuromuscular junction , and regulates astrocyte development . These developmental events characteristic for the Y set in which Wnt signaling is implicated are sketched in Figure 6 (see also “Developmental process”).
The most studied canonical Wnt/β-catenin pathway (regulation of canonical Wnt receptor signaling pathway (pY=0.016 and pG=0.017)) controls diverse BPs . Among others, Wnt/β-catenin signaling regulates the formation of axes and dorsal structures, the stem cells maintenance, the skin and cardiac development, and angiogenesis . This is in line with a number of the G set specific developmental processes as well (see “Developmental process”).
Moreover, Wnt and BMP (member of the TGFβ family) signals pattern the dorsal spinal cord  whereas the G set-specific Sonic hedgehog signaling is implicated in the ventral region (see Supplementary Information, Text “Biological analyses”). Interaction between Wnt and BMP signaling is further consistently supported by the Y set-specific small terms dorsal spinal cord development and BMP signaling pathway involved in spinal cord dorsal/ventral patterning. Also, regarding motor control and the Y set FGGY gene (see “Neurogenesis”), it has been demonstrated that Wnt- and BMP-dependent signaling could play relevant roles in the neurodegeneration of motor neurons in the context of ALS .
Wnt, BMP/TGFβ, and Sonic hedgehog function as morphogens early in development and also mediate axon guidance [117, 118]. Given the observed differences between the Y and G sets in signaling of these morphogens and the related developmental processes as well as apparent interlink between temporally coordinated processes regulating the A-P and dorsal-ventral (D-V) patterning  and guidance of axons , we suggest that at least part of overlapping SNPs in the Y and G reference sets may be linked to genes which are associated at different time points with specific processes of early development involved in A-P, D-V and L-R patterning pathways.
Thus, this section highlights crosstalk of specific signaling events with processes related to development in a time and context specific manner. Further support to our conclusions on connections of the Y and G sets of genes with embryogenesis and to the interlink of developmental processes with other groups of BPs ensuring proper development and related to different levels of GO hierarchy is detailed in Supplementary Information, Text “Biological analyses”.
This work focuses on comprehensive analyses of biologically-related specifics of two inheritable extensive clusters of SNPs in genome-wide LD which have been revealed as correlates of premature death in the FHS . The results of our analyses support three important findings. First, they show that SNPs in genome-wide LD were unlikely selected as a matter of chance. Second, they document the enrichment of genes for these SNPs in BPs which are remarkably important for an organism development in utero. Third, they highlight striking importance of systemic analyses of biological machinery involved in regulation of complex phenotypes.
A striking result was apparent functional and structural difference between SNPs in LD and those in matching random sets. That is, the Y and G sets of SNPs in genome-wide LD have substantially different functions and structure compared to the matching sets of randomly selected SNPs. These are the differences which are not driven by genotyping errors; a problem which is of inherent concern in genome-wide association studies. The evidences are: (i) SNPs from the reference sets are highly significantly enriched in genes compared to SNPs from the matching random control sets (Figure 1), (ii) SNPs from the reference sets are highly significantly enriched in shorter genes in contrast to SNPs from the random sets (Figure 2); (iii) the difference between the overlap of SNPs in the two reference sets and the overlaps of SNPs in pairs of random sets is highly significant (Figure 3). These evidences are supported by our prior finding of enrichment of non-synonymous coding SNPs in the reference sets . These functional and structural dissimilarities suggest that networks of genes from the Y and G sets should play profound biological role.
The analyses using bioinformatics tools highlighted highly significant enrichment of genes in developmental process and major BP GO terms tightly linked with development. High-levels BPs provide, however, limited information for biological interpretation of the results and for gaining insights into specifics of networks of interlinked BPs generated during embryogenesis. Given that we deal with genes for SNPs in genome-wide LD, we should expect a system of more informative specific BPs with a well-defined cross-talk. To gain farther insights into biological role of networks of genes for SNPs from the Y and G sets, we conducted in depth manual targeted analysis of literature and public databases. The focus was on revealing well-documented interlinks between specific BPs on lower levels in GO hierarchy. These analyses focused on nominally significant functional terms because interlink per se implies smaller chances of false enrichment.
In depth targeted biological analyses provide a striking result on enrichment of genes from the reference sets in variety of interlinked BPs which represent a system of time-synchronized events characterizing different stages of embryonic development of specific organs/tissue and systems which is a hallmark of both reference sets.
In both reference sets, contrary to the control random SNP sets, genes are enriched in BPs related to genetic imprinting (an epigenetic mechanism of inheritance that exhibit a parent-of-origin specific pattern of gene expression) and to carbohydrate metabolism (in support of glucose as the principal metabolic fuel for the growing embryo).
Our analyses suggest that genes from the Y set likely control earlier events in embryogenesis and neurulation with emphasis on A-P patterning and on organization in the head region. The analyses suggests coherent enrichment of the Y set genes in processes related to initial development of the hindbrain (i.e., cerebellum), sensory organs, skeletal muscles, astrocyte, and axonal outgrowth of motor neurons. These evidences, among the others, support and emphasize formation of basic apparatus for motor control that is an early event of embryonic and fetal development in accordance with priority of the organization and production of movement for primary organism functioning and survival, and for development of simple learning and behavioral processes [64,120].
Genes in the G set emphasize cell cycle and developmental cell growth along with specific processes related to the cytoskeleton. The G set genes are involved in a system of BPs highlighting progressive development of nervous and cardiovascular systems. This set was also characterized by progress in body patterning and axis formation along with initial development of other organs and systems as well as by progressing motor control, neurophysiological functions and the early behaviors along with accumulation of the relevant signaling and metabolic events.
Our results also suggest consistent biological role of two leading genes (FGGY and TUBB4B) with the key groups of specific BPs characterizing the Y and the G sets (as partly summarized in Table 1). Given these encouraging results, more detail analysis of the functional role of other specific genes from these sets is underway.
Our analyses indicate that different stages in embryonic induction of individual organs (e.g., inner ear) can be paralleled with development of responding organs and tissues and be regulated by the Y and G sets independently. In this regard, it appears that the systems of interlinked BPs may pinpoint different sets of biologically relevant genes from the total list of genes defining these BPs at different stages of an organism development and growth. This implies that specific sets of interlinked genes from different functional pathways may work together to ensure sequential development of specific organs and systems. Alternations in functions of these specific sets of genes during embryogenesis may lead to development of diseases in late life (supported, particularly, by findings on connections of fetal development with neurological processes in later life ). For example, alterations in the Y set specific synchronous development of early cerebellum and skeletal muscles and axonal outgrowth of motor neurons linked with development and activation of astrocytes may result in developing ALS in late life (see “Nervous system development”).
The analyses using bioinformatics tools are often limited to functional terms with larger number of genes. The argument is that statistics on enrichment of terms with small number of genes may be unreliable. However, the larger GO terms are, the less specific biological information is. In depth analyses of cross-talk of biological processes conducted in our paper suggests that specific BPs, in general, are more informative for biological analysis than the higher-level terms. Indeed, small child terms characteristic for the Y and G sets highlight systems of coherent biological processes which characterize coordinated events during embryonic development. These BPs also pinpoint differences in the systems of such embryonic events characteristic for the Y and G sets. It is important to emphasize that in contrast to the Y and G sets, no such systems of coherent BPs were observed in the matching control random sets. As a result, confining bioinformatics analyses to higher levels in the GO hierarchy may lead to loss of biological information. Therefore, analysis of small GO terms is needed to gain more specific insights into underlying biology which has high potential to be translated into clinical interventions. Accordingly, analyses using bioinformatics tools should be considered as a preliminary step with the goal to pinpoint potentially relevant specific processes (e.g., at a nominal level of significance). Then, these results should be complemented by targeted systemic biological analysis highlighting functional interlinks between the selected processes.
Taken together, our results suggest that coordinated action of BPs during embryogenesis may generate genome-wide LD. These genome-wide networks of genes and genetic variants may control complex phenotypes in late (post-reproductive) life. These results challenge our understanding of the role of genes in complex phenotypes and suggest an important role of genomics in health span and lifespan.
The research reported in this paper was supported by Grants No 1P01 AG043352-01 (AIY, AMK) and 1R01 AG047310 (AMK) from the National Institute on Aging. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The Framingham Heart Study is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with Boston University (Contract No. N01-HC-25195). This manuscript was not prepared in collaboration with investigators of the Framingham Heart Study and does not necessarily reflect the opinions or views of the Framingham Heart Study, Boston University, or NHLBI. Funding for SHARe Affymetrix genotyping was provided by NHLBI Contract N02-HL-64278. SHARe Illumina genotyping was provided under an agreement between Illumina and Boston University. This manuscript was prepared using a limited access dataset obtained from the NHLBI and the Framingham SHARe data obtained through dbGaP (accession number phs000007.v14.p5).
AUTHORS CONTRIBUTIONSAMK and IC contributed to the study conception, design, interpretation of the results, and writing the manuscript. AMK performed statistical analysis. IC performed biological analysis. AIY contributed to discussion and interpretation of the results and drafting the manuscript.