|Home | About | Journals | Submit | Contact Us | Français|
Large tracts of extended homozygosity are more prevalent in outbred populations than previously thought. With the advent of high-density genotyping platforms, regions of extended homozygosity can be accurately located allowing for the identification of rare recessive risk variants contributing to disease. We compared measures of extended homozygosity (greater than 1 megabase in length) in a population of 837 late onset Alzheimer’s disease (LOAD) cases and 550 controls. In our analyses, we identify one homozygous region on chromosome 8 that is significantly associated with LOAD after adjusting for multiple testing. This region contains seven genes from which the most biologically plausible candidates are STAR, EIF4EBP1 and ADRB3. We also compared the total numbers of homozygous runs and the total length of these runs between cases and controls, showing a suggestive difference in these measures (p-values 0.052-0.062). This research suggests a recessive component to the etiology of LOAD.
Recent research has noted that the human genome contains a higher frequency of extended tracts of contiguous homozygous single nucleotide polymorphisms (SNPs) than previously expected . The occurrence of these long homozygous tracts in uninterrupted sequences may represent deletion polymorphisms, loss of heterozygosity, segmental uniparental disomy, low minor allele frequency, or autozygosity. Recent data has suggested that these tracts most likely represent extended homozygosity by virtue of parental descent from a common ancestor . The longest tracts are expected to occur in more recently inbred populations with subsequent recombination events interrupting long chromosomal segments . Our group has reported the unexpected high degree of apparent parental consanguinity in control individuals from North America (9.5% of studied individuals harboring homozygous tracts larger than 5Mb) . Similar numbers (6.6%) were presented by Li L et al when studying an outbred population of unrelated Han Chinese  and by Gibson et al who reported that 1393 tracts exceeding 1Mb in length were observed in the 209 unrelated HapMap individuals studied . The same conclusion was obtained by McQuillan and colleagues in a recent manuscript . Using pedigrees from two isolated and two more cosmopolitan populations of European origin, they demonstrated that runs of homozygosity up to 4 Mb are common in out bred individuals .
The development of platforms able to genotype over one million SNPs has provided an unparalleled opportunity to study tracts of extended homozygosity. One obvious application of this technology is the study of recessive families. Recent research our group has reported a whole genome analysis of a consanguineous family with early onset Alzheimer’s disease (EOAD). This study presented the first catalog of extended homozygosity in EOAD and identified several regions as candidate loci for a recessive genetic lesion in AD . This same technology may also be used to perform homozygosity mapping for disease in populations with unknown pedigree structures from ostensibly out bred populations .
Alzheimer’s disease (AD) is a complex, multifactorial disorder in which a genetic component has been well established. A very small percentage of AD cases have an onset before 60-65 years of age, being classified as early onset AD. Most of these cases present a familial aggregation of the disease and some of them, an autosomal dominant pattern of inheritance . In these latter cases the disease may be associated with the presence of mutations in three genes: APP (amyloid precursor protein OMIM, 104760), PSEN1 (presenilin 1, OMIM 104311) and PSEN2 (presenelin 2, OMIM 600759) .
The great majority of Alzheimer cases are, however, late onset cases (LOAD) with no apparent familial aggregation. The E4 allele of the Apolipoprotein E gene (APOE, OMIM 107741) has been the only genetic factor consistently associated with both familial and sporadic forms of AD by different studies in several varied populations . Although APOE is neither necessary nor sufficient for the development of AD, its risk has been shown to be dose dependent and correlates with the age at onset of the disease . Thus, a large part of the genetic causes of AD are still unknown and the role of homozygosity has been considerably overlooked: only two studies of isolated populations with a high incidence of the disease have been reported and the primary analyses were performed using dominant or additive modes of inheritance .
In order to evaluate the role of runs of extended homozygosity in LOAD, we analyzed the association between Alzheimer’s disease and quantitative measures of extended homozygosity. The underlying hypothesis of this work is that an excess of large homozygous tracts in AD patients versus controls would support the notion that a recessive component exists for the disease. This led us to perform a statistical comparison of the extended tracts of homozygosity in a series of 837 LOAD patients and 550 neurological normal controls .
The data used for this analysis is publicly available at http://www.tgen.org/research/neuro_gab2.cfm and was generated by a genome wide analysis of unrelated LOAD cases and healthy controls . Detailed quality control data for this publicly available dataset is described in Reiman et al., 2007 . Due to the necessity of relatively complete genotyping coverage in analyses of extended homozygosity, we increased the genotyping success threshold to 97% successful genotypic calls per individual (allowing for an average call rate per individual of ~98.5%), but causing the exclusion of 24 participants from the initial dataset. The total population in our analyses after exclusions were made (total n=1387) was taken from three cohorts: the neuropathological discovery cohort (n= 722); the neuropathological replication cohort (n= 305), and the clinical replication cohort (n= 360). All participants included in the initial study were of self-reported European ancestry and population structure was evaluated using STRUCTURE . Only participants identified as genetically most similar to European HapMap samples were included in the analytic dataset, with 14 outliers removed prior to data analysis . A summary of the information available from each cohort used in the analyses is presented in Table 1. Neither individual ages nor sex were available for public-access.
Runs of extended homozygosity were identified using the PLINKv1.02 software package [http://pngu.mgh.harvard.edu/~purcell/plink/contact.shtml - cite]. We utilized robust criteria for inclusion of genomic regions into runs of homozygosity as a means of reducing confounding by copy number. We identified only large runs of homozygosity (ROH), the primary criteria for genomic inclusion in a ROH being at least 1 Mb of consecutive homozygous genotypic calls at adjacent SNP loci. The minimum SNP density coverage was at least 50 SNPs per Mb to be included in a homozygous run, allowing for centromeric and SNP-poor regions to be algorithmically excluded from any analyses. The genome was scanned for ROHs using a sliding window of 50 SNPs, and allowed at most 2 missing genotypes and 1 heterozygote call per ROH. These large ROHs identified are suggestive of a recent origin, as opposed to shorter runs suggestive of an ancient ROH origin.
Total length of the genome comprised in homozygous runs (expressed in Mb), average length of each homozygous run (in Mb per ROH) and total number of homozygous runs per participant were calculated. The total length of the genome comprised of homozygous runs is the sum of the distance of each individual run per participant. The average length of homozygous runs was generated by dividing the total genomic length of the homozygous runs by the total number of homozygous runs per participant.
From the initial structural analysis, 1090 consensus regions from overlapping ROHs were defined. These consensus regions were comprised of the loci shared by all overlapping segments of extended homozygosity in a particular genomic region and were used as positional loci for identification of common ROHs in our mapping efforts. Each consensus region was comprised of at least 3 SNPs and 100,00 base pairs and was found in no less than 10 participants. The largest consensus regions were 1,782,642 bp and contained 207 SNPs. The average consensus region defined was 351,744 bp in length and contained ~36 snps.
These consensus ROHs were analyzed using the maxT permutation test algorithm for case/control studies in PLINKv.1.02. This algorithm, which tests the frequency of segmental occurrence in cases compared to that in controls, is identical to that used in copy number variant analyses of disease association although it has been adapted to incorporate the larger ROH segments. 50,000 permutations of the maxT permutation testing algorithm were used to generate empirical p-values and association coefficients to evaluate risk of LOAD attributable to individual runs of homozygosity. These empirical p-values were then corrected for multiple testing incorporating the possibility of false positive results in each of the possible permutations per ROH tested during the label-swapping function.
We tested hypotheses of increased summary measures of extended homozygosity (i.e., number of ROHs, total and average ROH lengths) being associated with LOAD using simple two tailed T-tests to compare means between cases and controls. These measures were relatively normally distributed.
In order to localize specific homozygous regions significantly associated with LOAD, empirical p-values for 50,000 permutations of homozygous consensus overlapping regions were determined using a modified CNV algorithm from PLINK toolset, adapted to larger segments (Figure 1). One homozygous consensus region in chromosome 8 was found to be significantly overrepresented in cases when compared to controls (Table 2). This region is not concurrent with homozygous regions previously reported to have a frequency above 25% in healthy individuals  nor was it found as an overlapping long contiguous stretch of homozygosity in three different populations (Han Chinese, Taiwan aborigines and Caucasians) . The other top regions most significantly overrepresented in LOAD cases include additional loci on chromosomes 8, 9 and 10 and are associated with empirical p values between 0.001 and 0.01 (0.02 and 0.1 after adjustment for multiple testing corrections).
The genes present within the top 3% homozygous consensus regions overrepresented in cases when compared with controls are shown in Table 2. From the seven genes located in the homozygous region in chromosome 8 found to be most significantly different between cases and controls (RAB11FIP1; MGC33309; ADRB3; EIF4EBP1; ASH2L; STAR and LSM1), only the ADRB3 gene (GeneID: 155), coding for the beta-3-adrenergic receptor is represented in the AlzGene database . One SNP in this gene (rs4998) was found to be associated with AD in ApoE4 negative samples by Hamilton G. et al, in a case-control association study performed for candidate insulin signaling genes . Additionally, the steroidogenic acute regulatory protein (StAR), coded by the STAR gene (GeneID: 6770), was found to be markedly increased in the cytoplasm of hippocampal pyramidal neurons and in the cytoplasm of other non-neuronal cell types from AD brains, when compared with age-matched controls . Also, the levels of phosphorylated eukaryotic translation initiation factor 4E binding protein 1, encoded by the gene EIF4EBP1 (GeneID: 1978), have been previously found to be dramatically increased in AD and positively significantly correlated with total tau and p-tau .
The AD cases studied presented a slightly higher degree of extended homozygosity when compared with the control group. The most variability observed was associated with the total number of homozygous runs and total run length with suggestive p-values of 0.062 and 0.052 respectively, while the average run length difference between these two groups was not statistically significant (p>0.10).
In order to identify risk loci significantly different between cases and controls that might contain genetic lesions responsible for AD, empirical p-values for consensus homozygous regions were determined in both cases and controls groups. One consensus ROH in chromosome 8 was found to be significantly overrepresented in cases when compared to controls after correction for multiple testing. This region contains seven genes, three of which have previously been studied in relation to AD (Table 2).
The ADRB3 gene product, beta-3-adrenergic receptor, is one of three beta-adrenergic receptor subtypes. The product is located mainly in adipose tissue and is involved in the regulation of adipocyte lipolysis, thermogenesis and oxygen consumption . There is evidence that a disturbance in the insulin signal transduction pathway may be a central and early pathophysiologic event in sporadic AD . Patients with diabetes and insulin resistance have an increased risk of impaired cognition or dementia . Hamilton et al performed a two stage association study to investigate the role of insulin signaling genes in the risk of developing AD showing one SNP in ADRB3 (rs4998) was studied and found to be associated with AD in ApoE4 negative samples .
The steroidogenic acute regulatory protein (StAR), coded by the STAR gene, plays a key role in the acute regulation of steroid hormone synthesis by enhancing the conversion of cholesterol into pregnenolone . To test the hypothesis that hormones of the hypothalamic-pituitary-gonadal axis have a role in AD pathogenesis, K.M. Webber et al studied the expression levels of StAR in AD and control brains. The authors found an increase of this protein in AD hippocampal neurons as well as other non-neuronal cells compared to aged matched controls. These results, together with the finding that StAR is present in the same brain regions as the luteinizing hormone (LH) receptors, suggest that steroidogenic pathways regulated by LH may play a role in AD .
The EIF4EBP1 gene encodes one member of a family of translation repressor proteins, which interacts with eukaryotic translation initiation factor 4E (eIF4E). This is a limiting component of the multi-subunit complex that recruits 40S ribosomal subunits to the 5′ end of mRNAs. Interaction of this protein with eIF4E inhibits the complex assembly and represses translation. The 4E-BP1protein is phosphorylated in response to several signals, including insulin signaling and together with several other components of the translational machinery is regulated through signaling events that require the mammalian target of rapamycin (mTOR) . Neurofibrillary tangles, composed mainly of hyperphosphorylated tau protein, is one of the neuropathological hallmarks of AD. Tau mRNA levels have been shown not to be altered in sporadic AD brains. Nevertheless, Li X, et al studied the possibility that tau mRNAs in AD brains could be abnormally regulated by investigating the levels of various translation control elements including 4E-BP1, in the brains of AD and controls subjects. Together with increased levels of p-mTOR, they found increased levels of phosphorylated 4E-BP1 in AD and a positive significant correlation with total tau and phosphorylated tau .
Genetic variation in one or more of these genes may account for the development of different phenotypes related to LOAD and it is possible that this form of genetic heterogeneity coexists with the multifactorial, common-disease/common-variant mode of inheritance that is generally studied in whole-genome association .
From a methodological perspective, using ROHs to identify possible candidate loci for small effects size or rare recessive variants in unrelated individuals may be useful. This method uses relatively few statistical tests compared to conventional genome-wide association studies to probabilistically model recessive genome-wide associations with disease, increasing power to some degree. The method itself likely trades specificity of results for sensitivity, by seeking to identify large regions of homozygosity harboring as-of-yet unknown recessive disease components. In particular, homozygosity mapping is limited by fine mapping within the risk ROH not being feasible in the discovery population. To identify a more focused risk region in replication of a homozygosity mapping finding, a population of a different ethnic background or an admixed population would have to be studied, as boundaries of the ROHs would likely be different from the discovery cohort. The summary measures of genome-wide extended homozygosity may also prove useful in quantifying distant consanguinity in a single unrelated population, for which Fst calculations may not be applicable to gauge genetic distance from a founder group.
Using public whole genome SNP analysis data on late onset Alzheimer’s disease and neurologically normal controls we identified an average of 52.1 homozygous runs larger than 1 Mb containing more than 50 consecutive SNPs in 1387 samples. In order to determine if Alzheimer’s disease populations are more consanguineous than healthy controls we statistically compared the total number of homozygous runs, average length of homozygous runs and total length of genome contained in runs of homozygosity between cases and controls. For this analysis we used a stringent method, assuming that the presence of large tracts of contiguous homozygous SNPs represent a direct association with increased levels of relatively recent consanguinity . Only homozygous runs with sizes over 1 Mb were included in the analysis in order to rule out the effect of copy number variation, which could confound results if smaller minimum run sizes were used. Our borderline significant results suggest that the AD cases in this study may be characterized by a higher degree of extended homozygosity than the control group, although this generalization would need to be tested in a larger sample size.
As the individual ages were not available to the public, it was not possible to establish a relation between age and homozygosity. Still, our group has recently reported the first genetic proof that younger individuals present less homozygosity than older ones, presumably due to populations increased mobility in the last generations. Linear predictive models of homozygosity and current age, showed that younger individuals have smaller percentage of their genome contained in homozygous runs; which are significantly shorter in length (Nalls et al unpublished data). Similar results would be expected for the present populations.
The E4 allele of APOE has been the only genetic risk factor consistently associated with familial and sporadic forms of AD . In the present study no association between homozygosity at the APOE gene and AD was found. This is explained by the size of the statistical window used (1Mb), as the methodology used was designed to pick up variants with recently created ROHs that have not been broken up by recombination. High recombination rates in this genomic region in the European American population, prevent detection of any signal at the APOE gene using homozygosity mapping .
In summary, we conclude that extended runs of homozygosity are common in out bred populations and present statistical data that show specific large tracts of homozygosity are a risk factor for Alzheimer’s disease. The present results, together with the previously reported cases of consanguineous families with AD  and the speculation that recessive genes for AD are responsible for the high AD prevalence in the Wadi Ara , demonstrate that particular regions of homozygosity have a significant role in AD genetics. It is possible that future sequencing and additional follow-up analyses will allow the identification of small-effect size recessive risk variant(s) within the described ROHs.
This work was supported in part by the Intramural Research Program of the National Institute on Aging, National Institutes of Health, Department of Health and Human Services (Z01 AG000950-06) and the Portuguese Fundacao para a Ciencia e Tecnologia grants (SFRH/BD/29647/2006) and (SFRH/BD/27442/2006). The experiments here presented comply with the current laws of United States of America.