Mutations in known causal Alzheimer disease (AD) genes account for only 1% to 3% of patients and almost all are dominantly inherited. Recessive inheritance of complex phenotypes can be linked to long (>1-megabase [Mb]) runs of homozygosity (ROHs) detectable by single-nucleotide polymorphism (SNP) arrays.
To evaluate the association between ROHs and AD in an African American population known to have a risk for AD up to 3 times higher than white individuals.
DESIGN, SETTING, AND PARTICIPANTS
Case-control study of a large African American data set previously genotyped on different genome-wide SNP arrays conducted from December 2013 to January 2015. Global and locus-based ROH measurements were analyzed using raw or imputed genotype data. We studied the raw genotypes from 2 case-control subsets grouped based on SNP array: Alzheimer’s Disease Genetics Consortium data set (871 cases and 1620 control individuals) and Chicago Health and Aging Project–Indianapolis Ibadan Dementia Study data set (279 cases and 1367 control individuals). We then examined the entire data set using imputed genotypes from 1917 cases and 3858 control individuals.
MAIN OUTCOMES AND MEASURES
The ROHs larger than 1 Mb, 2 Mb, or 3 Mb were investigated separately for global burden evaluation, consensus regions, and gene-based analyses.
The African American cohort had a low degree of inbreeding (F ~ 0.006). In the Alzheimer’s Disease Genetics Consortium data set, we detected a significantly higher proportion of cases with ROHs greater than 2 Mb (P = .004) or greater than 3 Mb (P = .02), as well as a significant 114-kilobase consensus region on chr4q31.3 (empirical P value 2 = .04; ROHs >2 Mb). In the Chicago Health and Aging Project–Indianapolis Ibadan Dementia Study data set, we identified a significant 202-kilobase consensus region on Chr15q24.1 (empirical P value 2 = .02; ROHs >1 Mb) and a cluster of 13 significant genes on Chr3p21.31 (empirical P value 2 = .03; ROHs >3 Mb). A total of 43 of 49 nominally significant genes common for both data sets also mapped to Chr3p21.31. Analyses of imputed SNP data from the entire data set confirmed the association of AD with global ROH measurements (12.38 ROHs >1 Mb in cases vs 12.11 in controls; 2.986 Mb average size of ROHs >2 Mb in cases vs 2.889 Mb in controls; and 22% of cases with ROHs >3 Mb vs 19% of controls) and a gene-cluster on Chr3p21.31 (empirical P value 2 = .006-.04; ROHs >3 Mb). Also, we detected a significant association between AD and CLDN17 (empirical P value 2 = .01; ROHs >1 Mb), encoding a protein from the Claudin family, members of which were previously suggested as AD biomarkers.
CONCLUSIONS AND RELEVANCE
To our knowledge, we discovered the first evidence of increased burden of ROHs among patients with AD from an outbred African American population, which could reflect either the cumulative effect of multiple ROHs to AD or the contribution of specific loci harboring recessive mutations and risk haplotypes in a subset of patients. Sequencing is required to uncover AD variants in these individuals.
In nucleus populations, regions of the genome that have a high frequency of runs of homozygosity (ROH) occur and are associated with a reduction in genetic diversity, as well as adverse effects on fitness. It is currently unclear whether, and to what extent, ROH stretches persist in the crossbred genome and how genomic management in the nucleus population might impact low diversity regions and its implications on the crossbred genome.
We calculated a ROH statistic based on lengths of 5 (ROH5) or 10 (ROH10) Mb across the genome for genotyped Landrace (LA), Large White (LW) and Duroc (DU) dams. We simulated crossbred dam (LA × LW) and market [DU × (LA × LW)] animal genotypes based on observed parental genotypes and the ROH frequency was tabulated. We conducted a simulation using observed genotypes to determine the impact of minimizing parental relationships on multiple diversity metrics within nucleus herds, i.e. pedigree-(A), SNP-by-SNP relationship matrix or ROH relationship matrix. Genome-wide metrics included, pedigree inbreeding, heterozygosity and proportion of the genome in ROH of at least 5 Mb. Lastly, the genome was split into bins of increasing ROH5 frequency and, within each bin, heterozygosity, ROH5 and length (Mb) of ROH were evaluated.
We detected regions showing high frequencies of either ROH5 and/or ROH10 across both LW and LA on SSC1, SSC4, and SSC14, and across all breeds on SSC9. Long haplotypes were shared across parental breeds and thus, regions of ROH persisted in crossbred animals. Averaged across replicates and breeds, progeny had higher levels of heterozygosity (0.0056 ± 0.002%) and lower proportion of the genome in a ROH of at least 5 Mb (−0.015 ± 0.003%) than their parental genomes when genomic relationships were constrained, while pedigree relationships resulted in negligible differences at the genomic level. Across all breeds, only genomic data was able to target low diversity regions.
We show that long stretches of ROH present in the parents persist in crossbred animals. Furthermore, compared to using pedigree relationships, using genomic information to constrain parental relationships resulted in maintaining more genetic diversity and more effectively targeted low diversity regions.
Electronic supplementary material
The online version of this article (doi:10.1186/s12711-016-0269-y) contains supplementary material, which is available to authorized users.
Runs of homozygosity (ROHs), in which both parental alleles are identical, have been proposed to have recessive effects on multiple human complex diseases. Osteoporosis is a common complex disease characterized by low bone mineral density (BMD), which is highly heritable. And recessive loci that contribute to BMD variations have been identified. In this study, we performed a genome-wide ROHs association study using our SNP array data from three GWAS samples including 4,900 subjects from Caucasian and Chinese populations. Significant results were further subjected to replication in 3,747 additional subjects. ROHs associated with BMD were also tested for associations with osteoporotic fractures in a GWAS fracture sample. Combining results from all the samples, we identified 697 autosomal regions with ROHs. Among these, we detected genome-wide significant associations between BMD and 6 ROHs, including ROH1q31.3, 1p31.1, 3q26.1, 11q12.1, 21q22.1 and 15q22.3 (combined P=6.29 × 10−5 − 3.17 × 10 −8). Especially, ROH1p31.1 was found to be associated with increased risk of osteoporotic hip fractures (odds ratio [OR] 3.71, P=0.032). To investigate the functional relevance of the identified ROHs, we performed cis-expression quantitative trait locus (eQTL) analysis in lymphoblast cell lines. Three ROHs, including ROH1p31.1, 11q12.1, and 15q22.3, were found to be significantly associated with mRNA expression levels of their nearby genes (PeQTL < 0.05). In summary, our findings reveal that ROHs could play as recessive-acting determinants contributing to the pathogenesis of osteoporosis. Further molecular and functional studies are needed to explore and clarify the potential mechanism.
ROHS; BMD; FRACTURES; ASSOCIATION
The search for novel Alzheimer disease (AD) genes or pathologic mutations within known AD loci is ongoing. The development of array technologies has helped to identify rare recessive mutations among long runs of homozygosity (ROHs), in which both parental alleles are identical. Caribbean Hispanics are known to have an elevated risk for AD and tend to have large families with evidence of inbreeding.
To test the hypothesis that the late-onset AD in a Caribbean Hispanic population might be explained in part by the homozygosity of unknown loci that could harbor recessive AD risk haplotypes or pathologic mutations.
We used genome-wide array data to identify ROHs (>1 megabase) and conducted global burden and locus-specific ROH analyses.
A whole-genome case-control ROH study.
A Caribbean Hispanic data set of 547 unrelated cases (48.8% with familial AD) and 542 controls collected from a population known to have a 3-fold higher risk of AD vs non-Hispanics in the same community. Based on a Structure program analysis, our data set consisted of African Hispanic (207 cases and 192 controls) and European Hispanic (329 cases and 326 controls) participants.
Alzheimer disease risk genes.
MAIN OUTCOMES AND MEASURES
We calculated the total and mean lengths of the ROHs per sample. Global burden measurements among autosomal chromosomes were investigated in cases vs controls. Pools of overlapping ROH segments (consensus regions) were identified, and the case to control ratio was calculated for each consensus region. We formulated the tested hypothesis before data collection.
In total, we identified 17 137 autosomal regions with ROHs. The mean length of the ROH per person was significantly greater in cases vs controls (P = .0039), and this association was stronger with familial AD (P = .0005). Among the European Hispanics, a consensus region at the EXOC4 locus was significantly associated with AD even after correction for multiple testing (empirical P value 1 [EMP1], .0001; EMP2, .002; 21 AD cases vs 2 controls). Among the African Hispanic subset, the most significant but nominal association was observed for CTNNA3, a well-known AD gene candidate (EMP1, .002; 10 AD cases vs 0 controls).
CONCLUSIONS AND RELEVANCE
Our results show that ROHs could significantly contribute to the etiology of AD. Future studies would require the analysis of larger, relatively inbred data sets that might reveal novel recessive AD genes. The next step is to conduct sequencing of top significant loci in a subset of samples with overlapping ROHs.
Variation in environment, management practices, nutrition or selection objectives has led to a variety of different choices being made in the use of genetic material between countries. Differences in genome-level homozygosity between countries may give rise to regions that result in inbreeding depression to differ. The objective of this study was to characterize regions that have an impact on a runs of homozygosity (ROH) metric and estimate their association with the additive genetic effect of milk (MY), fat (FY) and protein yield (PY) and calving interval (CI) using Australia (AU) and United States (US) Jersey cows.
Genotyped cows with phenotypes on MY, FY and PY (n = 6751 US; n = 3974 AU) and CI (n = 5816 US; n = 3905 AU) were used in a two-stage analysis. A ROH statistic (ROH4Mb), which counts the frequency of a SNP being in a ROH of at least 4 Mb was calculated across the genome. In the first stage, residuals were obtained from a model that accounted for the portion explained by the estimated breeding value. In the second stage, these residuals were regressed on ROH4Mb using a single marker regression model and a gradient boosted machine (GBM) algorithm. The relationship between the additive and ROH4Mb of a region was characterized based on the (co)variance of 500 kb estimated genomic breeding values derived from a Bayesian LASSO analysis. Phenotypes to determine ROH4Mb and additive effects were residuals from the two-stage approach and yield deviations, respectively.
Associations between yield traits and ROH4Mb were found for regions on BTA13, BTA23 and BTA25 for the US population and BTA3, BTA7, BTA17 for the AU population. Only one association (BTA7) was found for CI and ROH4Mb for the US population. Multiple potential epistatic interactions were characterized based on the GBM analysis. Lastly, the covariance sign between ROH4Mb and additive SNP effect of a region was heterogeneous across the genome.
We identified multiple genomic regions associated with ROH4Mb in US and AU Jersey females. The covariance of regions impacting ROH4Mb and the additive genetic effect were positive and negative, which provides evidence that the homozygosity effect is location dependent.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-2001-7) contains supplementary material, which is available to authorized users.
Genome-wide association studies (GWASs) help to understand the effects of single nucleotide polymorphisms (SNPs) on breast cancer (BC) progression and survival. We performed multiple analyses on data from a previously conducted GWAS for the influence of individual SNPs, runs of homozygosity (ROHs) and inbreeding on BC survival. (I.) The association of individual SNPs indicated no differences in the proportions of homozygous individuals among short-time survivors (STSs) and long-time survivors (LTSs). (II.) The analysis revealed differences among the populations for the number of ROHs per person and the total and average length of ROHs per person and among LTSs and STSs for the number of ROHs per person. (III.) Common ROHs at particular genomic positions were nominally more frequent among LTSs than in STSs. Common ROHs showed significant evidence for natural selection (iHS, Tajima’s D, Fay-Wu’s H). Most regions could be linked to genes related to BC progression or treatment. (IV.) Results were supported by a higher level of inbreeding among LTSs. Our results showed that an increased level of homozygosity may result in a preference of individuals during BC treatment. Although common ROHs were short, variants within ROHs might favor survival of BC and may function in a recessive manner.
Parkinson's disease (PD) occurs in both familial and sporadic forms, and both monogenic and complex genetic factors have been identified. Early onset PD (EOPD) is particularly associated with autosomal recessive (AR) mutations, and three genes, PARK2, PARK7 and PINK1, have been found to carry mutations leading to AR disease. Since mutations in these genes account for less than 10% of EOPD patients, we hypothesized that further recessive genetic factors are involved in this disorder, which may appear in extended runs of homozygosity.
We carried out genome wide SNP genotyping to look for extended runs of homozygosity (ROHs) in 1,445 EOPD cases and 6,987 controls. Logistic regression analyses showed an increased level of genomic homozygosity in EOPD cases compared to controls. These differences are larger for ROH of 9 Mb and above, where there is a more than three-fold increase in the proportion of cases carrying a ROH. These differences are not explained by occult recessive mutations at existing loci. Controlling for genome wide homozygosity in logistic regression analyses increased the differences between cases and controls, indicating that in EOPD cases ROHs do not simply relate to genome wide measures of inbreeding. Homozygosity at a locus on chromosome19p13.3 was identified as being more common in EOPD cases as compared to controls. Sequencing analysis of genes and predicted transcripts within this locus failed to identify a novel mutation causing EOPD in our cohort.
There is an increased rate of genome wide homozygosity in EOPD, as measured by an increase in ROHs. These ROHs are a signature of inbreeding and do not necessarily harbour disease-causing genetic variants. Although there might be other regions of interest apart from chromosome 19p13.3, we lack the power to detect them with this analysis.
Genome-wide association studies (GWASs) have identified several single-nucleotide polymorphisms (SNPs) influencing the risk of thyroid cancer (TC). Most cancer predisposition genes identified through GWASs function in a co-dominant manner, and studies have not found evidence for recessively functioning disease loci in TC. Our study examines whether homozygosity is associated with an increased risk of TC and searches for novel recessively acting disease loci.
Data from a previously conducted GWAS were used for the estimation of the proportion of phenotypic variance explained by all common SNPs, the detection of runs of homozygosity (ROH) and the determination of inbreeding to unravel their influence on TC.
Inbreeding coefficients were significantly higher among cases than controls. Association on a SNP-by-SNP basis was controlled by using the false discovery rate at a level of q* < 0.05, with 34 SNPs representing true differences in homozygosity between cases and controls. The average size, the number and total length of ROHs per person were significantly higher in cases than in controls. A total of 16 recurrent ROHs of rather short length were identified although their association with TC risk was not significant at a genome-wide level. Several recurrent ROHs harbor genes associated with risk of TC. All of the ROHs showed significant evidence for natural selection (iHS, Fst, Fay and Wu’s H).
Our results support the existence of recessive alleles in TC susceptibility. Although regions of homozygosity were rather small, it might be possible that variants within these ROHs affect TC risk and may function in a recessive manner.
Electronic supplementary material
The online version of this article (doi:10.1186/s12885-016-2264-7) contains supplementary material, which is available to authorized users.
Thyroid cancer; Runs of homozygosity; Inbreeding; GWAS
Dairy cattle breeding objectives are in general similar across countries, but environment and management conditions may vary, giving rise to slightly different selection pressures applied to a given trait. This potentially leads to different selection pressures to loci across the genome that, if large enough, may give rise to differential regions with high levels of homozygosity. The objective of this study was to characterize differences and similarities in the location and frequency of homozygosity related measures of Jersey dairy cows and bulls from the United States (US), Australia (AU) and New Zealand (NZ).
The populations consisted of a subset of genotyped Jersey cows born in US (n = 1047) and AU (n = 886) and Jersey bulls progeny tested from the US (n = 736), AU (n = 306) and NZ (n = 768). Differences and similarities across populations were characterized using a principal component analysis (PCA) and a run of homozygosity (ROH) statistic (ROH45), which counts the frequency of a single nucleotide polymorphism (SNP) being in a ROH of at least 45 SNP. Regions that exhibited high frequencies of ROH45 and those that had significantly different ROH45 frequencies between populations were investigated for their association with milk yield traits. Within sex, the PCA revealed slight differentiation between the populations, with the greatest occurring between the US and NZ bulls. Regions with high levels of ROH45 for all populations were detected on BTA3 and BTA7 while several other regions differed in ROH45 frequency across populations, the largest number occurring for the US and NZ bull contrast. In addition, multiple regions with different ROH45 frequencies across populations were found to be associated with milk yield traits.
Multiple regions exhibited differential ROH45 across AU, NZ and US cow and bull populations, an interpretation is that locations of the genome are undergoing differential directional selection. Two regions on BTA3 and BTA7 had high ROH45 frequencies across all populations and will be investigated further to determine the gene(s) undergoing directional selection.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1352-4) contains supplementary material, which is available to authorized users.
Dairy cattle; Runs of homozygosity; Signature of selection
Inbreeding is often an inevitable outcome of strong directional artificial selection but on average it reduces population fitness with increased frequency of recessive deleterious alleles. Runs of homozygosity (ROH) representing genomic autozygosity that occur from mating between selected and genomically related individuals may be able to reveal the regions affecting fitness. To examine the influence of genomic autozygosity on fitness, we used a genome-wide association test to evaluate potential negative correlations between ROH and daughter pregnancy rate (DPR) or somatic cell score (SCS) in US Jersey cattle. In addition, relationships between changes of local ROH and inbreeding coefficients (F) were assessed to locate genomic regions with increased inbreeding. Despite finding some decreases in fertility associated with incremental increases in F, most emerging local ROH were not significantly associated with DPR or SCS. Furthermore, the analyses of ROH could be approximated with the most frequent haplotype(s), including the associations of ROH and F or traits. The analysis of the most frequent haplotype revealed that associations of ROH and fertility could be accounted for by the additive genetic effect on the trait. Thus, we suggest that a change of autozygosity is more likely to demonstrate footprints of selected haplotypes for production rather than highlight the possible increased local autozygosity of a recessive detrimental allele resulting from the mating between closely related animals in Jersey cattle.
Modern horses represent heterogeneous populations specifically selected for appearance and performance. Genomic regions under high selective pressure show characteristic runs of homozygosity (ROH) which represent a low genetic diversity. This study aims at detecting the number and functional distribution of ROHs in different horse populations using next generation sequencing data.
Next generation sequencing was performed for two Sorraia, one Dülmen Horse, one Arabian, one Saxon-Thuringian Heavy Warmblood, one Thoroughbred and four Hanoverian. After quality control reads were mapped to the reference genome EquCab2.70. ROH detection was performed using PLINK, version 1.07 for a trimmed dataset with 11,325,777 SNPs and a mean read depth of 12. Stretches with homozygous genotypes of >40 kb as well as >400 kb were defined as ROHs. SNPs within consensus ROHs were tested for neutrality. Functional classification was done for genes annotated within ROHs using PANTHER gene list analysis and functional variants were tested for their distribution among breed or non-breed groups.
ROH detection was performed using whole genome sequences of ten horses of six populations representing various breed types and non-breed horses. In total, an average number of 3492 ROHs were detected in windows of a minimum of 50 consecutive homozygous SNPs and an average number of 292 ROHs in windows of 500 consecutive homozygous SNPs. Functional analyses of private ROHs in each horse revealed a high frequency of genes affecting cellular, metabolic, developmental, immune system and reproduction processes. In non-breed horses, 198 ROHs in 50-SNP windows and seven ROHs in 500-SNP windows showed an enrichment of genes involved in reproduction, embryonic development, energy metabolism, muscle and cardiac development whereas all seven breed horses revealed only three common ROHs in 50-SNP windows harboring the fertility-related gene YES1. In the Hanoverian, a total of 18 private ROHs could be shown to be located in the region of genes potentially involved in neurologic control, signaling, glycogen balance and reproduction. Comparative analysis of homozygous stretches common in all ten horses displayed three ROHs which were all located in the region of KITLG, the ligand of KIT known to be involved in melanogenesis, haematopoiesis and gametogenesis.
The results of this study give a comprehensive insight into the frequency and number of ROHs in various horses and their potential influence on population diversity and selection pressures. Comparisons of breed and non-breed horses suggest a significant artificial as well as natural selection pressure on reproduction performance in all types of horse populations.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1977-3) contains supplementary material, which is available to authorized users.
Runs of homozygosity; Horse population; Selection signature; Reproduction; KITLG
Recent developments in sequencing technology have facilitated widespread investigations of genomic variants, including continuous stretches of homozygous genomic regions. For cattle, a large proportion of these runs of homozygosity (ROH) are likely the result of inbreeding due to the accumulation of elite alleles from long-term selective breeding programs. In the present study, ROH were characterized in four cattle breeds with whole genome sequence data and the distribution of predicted functional variants was detected in ROH regions and across different ROH length classes.
On average, 19.5 % of the genome was located in ROH across four cattle breeds. There were an average of 715.5 ROH per genome with an average size of ~750 kbp, ranging from 10 (minimum size considered) to 49,290 kbp. There was a significant correlation between shared short ROH regions and regions putatively under selection (p < 0.001). By investigating the relationship between ROH and the predicted deleterious and non-deleterious variants, we gained insight into the distribution of functional variation in inbred (ROH) regions. Predicted deleterious variants were more enriched in ROH regions than predicted non-deleterious variants, which is consistent with observations in the human genome. We also found that increased enrichment of deleterious variants was significantly higher in short (<100 kbp) and medium (0.1 to 3 Mbp) ROH regions compared with long (>3 Mbp) ROH regions (P < 0.001), which is different than what has been observed in the human genome.
This study illustrates the distribution of ROH and functional variants within ROH in cattle populations. These patterns are different from those in the human genome but consistent with the natural history of cattle populations, which is confirmed by the significant correlation between shared short ROH regions and regions putatively under selection. These findings contribute to understanding the effects of inbreeding and probably selection in shaping the distribution of functional variants in the cattle genome.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1715-x) contains supplementary material, which is available to authorized users.
Runs of homozygosity; Polymorphisms; Inbreeding; Cattle; Genome sequencing
Runs of homozygosity (ROH) represents extended length of homozygotes on a long genomic distance. In oncology, it is known as loss of heterozygosity (LOH) if identified exclusively in cancer cell rather than in matched control cell. Studies have identified several genomic regions which show consistent ROH in different kinds of carcinoma. To query whether this consistency can be observed on broader spectrum, both in more cancer types and in wider genomic regions, we investigated ROH patterns in the National Cancer Institute 60 cancer cell line panel (NCI-60) and HapMap Caucasian healthy trio families. Using results from Affymetrix 500 K SNP arrays, we report a genome wide significant association of ROH regions between the NCI-60 and HapMap samples, with much a higher level of ROH (11 fold) in the cancer cell lines. Analysis shows that more severe ROH found in cancer cells appears to be the extension of existing ROH in healthy state. In the HapMap trios, the adult subgroup had a slightly but significantly higher level (1.02 fold) of ROH than did the young subgroup. For several ROH regions we observed the co-occurrence of fragile sites (FRAs). However, FRA on the genome wide level does not show a clear relationship with ROH regions.
An increased abundance of runs of homozygosity (ROH) has been associated with risk for various diseases, including schizophrenia. Here we investigate the characteristics of ROH in Palau, an Oceanic population, evaluating whether these characteristics are related to risk for psychotic disorders and the nature of this association. To accomplish these aims we evaluate a sample of 203 cases with schizophrenia and related psychotic disorders – representing almost complete ascertainment of affected individuals in the population – and contrast their ROH to that of 125 subjects chosen to function as controls.
While Palauan diagnosed with psychotic disorders tend to have slightly more ROH regions than controls, the distinguishing features are that they have longer ROH regions, greater total length of ROH, and their ROH tends to co-occur more often at the same locus. The nature of the sample allows us to investigate whether rare, highly-penetrant recessive variants generate such case-control differences in ROH. Neither rare, highly penetrant recessive variants nor individual common variants of large effect account for a substantial proportion of risk for psychosis in Palau. These results suggest a more nuanced model for risk is required to explain patterns of ROH for this population.
Homozygosity; ROH; Schizophrenia; Psychosis; Palau
Runs of homozygosity (ROHs) are a class of important but poorly studied genomic variations and may be involved in individual susceptibility to diseases. To better understand ROH and its relationship with lung cancer, we performed a genome-wide ROH analysis of a subset of a previous genome-wide case-control study (1,473 cases and 1,962 controls) in a Han Chinese population. ROHs were classified into two classes, based on lengths, intermediate and long ROHs, to evaluate their association with lung cancer risk using existing genome-wide single nucleotide polymorphism (SNP) data. We found that the overall level of intermediate ROHs was significantly associated with a decreased risk of lung cancer (odds ratio = 0.63; 95% confidence interval: 0.51-0.77; P = 4.78×10−6 ), while the long ROHs seemed to be a risk factor of lung cancer. We also identified one ROH region at 14q23.1 that was consistently associated with lung cancer risk in the study. These results indicated that ROHs may be a new class of variation which may be associated with lung cancer risk, and genetic variants at 14q23.1 may be involved in the development of lung cancer.
lung cancer; runs of homozygosity (ROHs); genome-wide association study
The increasing availability of DNA markers provides new metrics of inbreeding based on single nucleotide polymorphisms (SNPs), i.e. molecular inbreeding or the proportion of runs of homozygosity (ROH), as alternatives to traditional pedigree-based inbreeding coefficients. However, none of these metrics incorporate the length of ROH as an indicator of recent inbreeding. Novel inbreeding coefficients that incorporate length of ROH as a random variable with an associated density are investigated.
New inbreeding metrics based on the distribution of the length of ROH are proposed: (1) the Kolmolgorov–Smirnov test, (2) a function of the quantiles of the cumulative distribution function of an individual versus the population, and (3) fitting of an exponential distribution to ROH lengths (mean, variance, and the probability of drawing at random a ROH larger than a given threshold). The new inbreeding and pedigree-based metrics were compared using 217 sows of an Iberian line that belong to three groups: C1 (conservation), C2 (conservation derived from C1), and S (selected and derived from C1), with complete pedigrees and genotyped for 35,023 SNPs.
Correlations between pedigree-based and the new genomic inbreeding coefficients ranged from 0.22 to 0.72 but most ranged from 0.60 to 0.70. The correlation between quantile chromosomal inbreeding coefficients (using molecular information of just one chromosome at the time) and chromosomal length was 0.84 (SE = 0.14), supporting the hypothesis that these coefficients incorporate information on ROH length as an indication of recent inbreeding. Kolmogorov–Smirnov and exponential chromosomal inbreeding coefficients were also correlated with chromosomal length (0.57). Chromosome 1 had the largest quantile ROH inbreeding coefficient (largest ROH sizes), whereas chromosome 10 had the lowest (shortest ROH sizes). Selection for lean growth increased ROH-based inbreeding coefficients for group S when compared to unselected groups C1 and C2. At the chromosomal level, this comparison showed that the level of autozygosity and the length of ROH for most of the autosomes increased in the selection line.
Quantile and exponential probability inbreeding coefficients using ROH length as a random variable provide additional information about recent inbreeding compared to existing inbreeding coefficients such as molecular, pedigree-based or total ROH content inbreeding coefficients.
Electronic supplementary material
The online version of this article (doi:10.1186/s12711-015-0153-1) contains supplementary material, which is available to authorized users.
Runs of homozygosity (ROH) may play a role in complex diseases. In the current study, we aimed to test if ROHs are linked to the risk of autism and related language impairment. We analyzed 546,080 SNPs in 315 Han Chinese affected with autism and 1,115 controls. ROH was defined as an extended homozygous haplotype spanning at least 500 kb. Relative extended haplotype homozygosity (REHH) for the trait-associated ROH region was calculated to search for the signature of selection sweeps. Totally, we identified 676 ROH regions. An ROH region on 11q22.3 was significantly associated with speech delay (corrected p = 1.73×10−8). This region contains the NPAT and ATM genes associated with ataxia telangiectasia characterized by language impairment; the CUL5 (culin 5) gene in the same region may modulate the neuronal migration process related to language functions. These three genes are highly expressed in the cerebellum. No evidence for recent positive selection was detected on the core haplotypes in this region. The same ROH region was also nominally significantly associated with speech delay in another independent sample (p = 0.037; combinatorial analysis Stouffer’s z trend = 0.0005). Taken together, our findings suggest that extended recessive loci on 11q22.3 may play a role in language impairment in autism. More research is warranted to investigate if these genes influence speech pathology by perturbing cerebellar functions.
It is well known that inbreeding increases the risk of recessive monogenic diseases, but it is less certain whether it contributes to the etiology of complex diseases such as schizophrenia. One way to estimate the effects of inbreeding is to examine the association between disease diagnosis and genome-wide autozygosity estimated using runs of homozygosity (ROH) in genome-wide single nucleotide polymorphism arrays. Using data for schizophrenia from the Psychiatric Genomics Consortium (n = 21,868), Keller et al. (2012) estimated that the odds of developing schizophrenia increased by approximately 17% for every additional percent of the genome that is autozygous (β = 16.1, CI(β) = [6.93, 25.7], Z = 3.44, p = 0.0006). Here we describe replication results from 22 independent schizophrenia case-control datasets from the Psychiatric Genomics Consortium (n = 39,830). Using the same ROH calling thresholds and procedures as Keller et al. (2012), we were unable to replicate the significant association between ROH burden and schizophrenia in the independent PGC phase II data, although the effect was in the predicted direction, and the combined (original + replication) dataset yielded an attenuated but significant relationship between Froh and schizophrenia (β = 4.86,CI(β) = [0.90,8.83],Z = 2.40,p = 0.02). Since Keller et al. (2012), several studies reported inconsistent association of ROH burden with complex traits, particularly in case-control data. These conflicting results might suggest that the effects of autozygosity are confounded by various factors, such as socioeconomic status, education, urbanicity, and religiosity, which may be associated with both real inbreeding and the outcome measures of interest.
It is well known that mating between relatives increases the risk that a child will have a rare recessive genetic disease, but there has also been increasing interest and inconsistent findings on whether inbreeding is a risk factor for common, complex psychiatric disorders such as schizophrenia. The best powered study to date investigating this theory predicted that the odds of developing schizophrenia increase by approximately 17% for every additional percent of the genome that shows evidence of inbreeding. In this replication, we used genome-wide single nucleotide polymorphism data from 18,562 schizophrenia cases and 21,268 controls to quantify the degree to which they were inbred and to test the hypothesis that schizophrenia cases show higher mean levels of inbreeding. Contrary to the original study, we did not find evidence for distant inbreeding to play a role in schizophrenia risk. There are various confounding factors that could explain the discrepancy in results from the original study and our replication, and this should serve as a cautionary note–careful attention should be paid to issues like ascertainment when using the data from genome-wide case-control association studies for secondary analyses for which the data may not have originally been intended.
Regions of restricted genetic heterogeneity due to identity by descent (autozygosity) are known to confer susceptibility to a number of diseases. Regions of germline homozygosity (ROHs) of 1–2 Mb, the result of autozygosity, are detectable at high frequency in outbred populations. Recent studies have reported that ROHs, possibly through exposing recessive disease-causing alleles or alternative mechanisms, are associated with an increased cancer risk. To examine whether homozygosity is associated with breast or prostate cancer risk, we analysed 500K single-nucleotide polymorphism data from two genome-wide association studies conducted by the Cancer Genetics Markers of Susceptibility initiatives (http://cgems.cancer.gov/). Six common ROHs were associated with breast cancer risk and four with prostate cancer (P<0.01). Intriguingly, one of the breast cancer ROHs maps to 6q22.31–6q22.3, a region that has been previously shown to confer breast cancer risk. Although none of the ROHs remained significantly associated with cancer risk after adjustment for multiple testing, a number of ROHs merit further interrogation. However, our findings provide no strong evidence that levels of measured homozygosity, whatever their aetiology (autozygosity, uniparental isodisomy or hemizygosity), confer an increased risk of developing breast or prostate cancer in predominantly outbred populations.
homozygosity; risk; prostate; breast; cancer
A central aim for studying runs of homozygosity (ROHs) in genome-wide SNP data is to detect the effects of autozygosity (stretches of the two homologous chromosomes within the same individual that are identical by descent) on phenotypes. However, it is unknown which current ROH detection program, and which set of parameters within a given program, is optimal for differentiating ROHs that are truly autozygous from ROHs that are homozygous at the marker level but vary at unmeasured variants between the markers.
We simulated 120 Mb of sequence data in order to know the true state of autozygosity. We then extracted common variants from this sequence to mimic the properties of SNP platforms and performed ROH analyses using three popular ROH detection programs, PLINK, GERMLINE, and BEAGLE. We varied detection thresholds for each program (e.g., prior probabilities, lengths of ROHs) to understand their effects on detecting known autozygosity.
Within the optimal thresholds for each program, PLINK outperformed GERMLINE and BEAGLE in detecting autozygosity from distant common ancestors. PLINK's sliding window algorithm worked best when using SNP data pruned for linkage disequilibrium (LD).
Our results provide both general and specific recommendations for maximizing autozygosity detection in genome-wide SNP data, and should apply equally well to research on whole-genome autozygosity burden or to research on whether specific autozygous regions are predictive using association mapping methods.
Genome-wide association studies (GWASs) of major depressive disorder (MDD) have yet to identify variants that surpass the threshold for genome-wide significance. A recent study reported that runs of homozygosity (ROH) are associated with schizophrenia, reflecting a novel genetic risk factor resulting from increased parental relatedness and recessive genetic effects. Here we undertake an analysis of ROH for MDD using the 9,238 MDD cases and 9,521 controls reported in a recent mega-analysis of 9 GWAS. Since evidence for association with ROH could reflect a recessive mode of action at loci, we also conducted a genome-wide association analyses under a recessive model.
The genome-wide association analysis using a recessive model found no significant associations. Our analysis of ROH suggested that there was significant heterogeneity of effect across studies in effect (p=0.001), and it was associated with genotyping platform and country of origin. The results of the ROH analysis show that differences across studies can lead to conflicting systematic genome-wide differences between cases and controls that are unaccounted for by traditional covariates. They highlight the sensitivity of the ROH method to spurious associations, and the need to carefully control for potential confounds in such analyses. We found no strong evidence for a recessive model underlying MDD.
Inbreeding has long been recognized as a primary cause of fitness reduction in both wild and domesticated populations. Consanguineous matings cause inheritance of haplotypes that are identical by descent (IBD) and result in homozygous stretches along the genome of the offspring. Size and position of regions of homozygosity (ROHs) are expected to correlate with genomic features such as GC content and recombination rate, but also direction of selection. Thus, ROHs should be non-randomly distributed across the genome. Therefore, demographic history may not fully predict the effects of inbreeding. The porcine genome has a relatively heterogeneous distribution of recombination rate, making Sus scrofa an excellent model to study the influence of both recombination landscape and demography on genomic variation. This study utilizes next-generation sequencing data for the analysis of genomic ROH patterns, using a comparative sliding window approach. We present an in-depth study of genomic variation based on three different parameters: nucleotide diversity outside ROHs, the number of ROHs in the genome, and the average ROH size. We identified an abundance of ROHs in all genomes of multiple pigs from commercial breeds and wild populations from Eurasia. Size and number of ROHs are in agreement with known demography of the populations, with population bottlenecks highly increasing ROH occurrence. Nucleotide diversity outside ROHs is high in populations derived from a large ancient population, regardless of current population size. In addition, we show an unequal genomic ROH distribution, with strong correlations of ROH size and abundance with recombination rate and GC content. Global gene content does not correlate with ROH frequency, but some ROH hotspots do contain positive selected genes in commercial lines and wild populations. This study highlights the importance of the influence of demography and recombination on homozygosity in the genome to understand the effects of inbreeding.
Small populations have an increased risk of inbreeding depression due to a higher expression of deleterious alleles. This can have major consequences for the viability of these populations. In domesticated species like the pig that are artificially selected in breeding populations, but also in wild populations that experience habitat decline, maintaining genetic diversity is essential. Recent advances in sequence technology enabled us to identify patterns of nucleotide variation in individual genomes. We screened the full genome of wild boars and commercial pigs from Eurasia for regions of homozygosity. We found these regions of homozygosity were caused by the demographic history and effective population size of the pigs. European wild boars are least variable, but also European breeds contain large homozygous stretches in their genome. Moreover, the likelihood of a region becoming depleted depends on its position in the genome, because variation has a high correlation with recombination rate. The telomeric regions are much more variable, and the central region of chromosomes has a higher chance of containing long regions of homozygosity. These findings increase knowledge on the fine-scaled architecture of genomic variation, and they are particularly important for population genetic management.
Runs of homozygosity (ROH) are contiguous lengths of homozygous genotypes that are present in an individual due to parents transmitting identical haplotypes to their offspring. The extent and frequency of ROHs may inform on the ancestry of an individual and its population. Here we use high density (n = 777,962) bi-allelic SNPs in a range of cattle breed samples to correlate ROH with the pedigree-based inbreeding coefficients and to validate subsequent analyses using 54,001 SNP genotypes. This study provides a first testing of the inference drawn from ROH through comparison with estimates of inbreeding from calculations based on the detailed pedigree data available for several breeds.
All animals genotyped on the HD panel displayed at least one ROH that was between 1–5 Mb in length with certain regions of the genome more likely to be involved in a ROH than others. Strong correlations (r = 0.75, p < 0.0001) existed between the pedigree-based inbreeding coefficient and a statistic based on sum of ROH of length > 0.5 KB and suggests that in the absence of an animal’s pedigree data, the extent of a genome under ROH may be used to infer aspects of recent population history even from relatively few samples.
Our findings suggest that ROH are frequent across all breeds but differing patterns of ROH length and burden illustrate variations in breed origins and recent management.
Runs of homozygosity; Inbreeding; Cattle population history
The intensive selection programs for milk made possible by mass artificial insemination increased the similarity among the genomes of North American (NA) Holsteins tremendously since the 1960s. This migration of elite alleles has caused certain regions of the genome to have runs of homozygosity (ROH) occasionally spanning millions of continuous base pairs at a specific locus. In this study, genome signatures of artificial selection in NA Holsteins born between 1953 and 2008 were identified by comparing changes in ROH between three distinct groups under different selective pressure for milk production. The ROH regions were also used to estimate the inbreeding coefficients. The comparisons of genomic autozygosity between groups selected or unselected since 1964 for milk production revealed significant differences with respect to overall ROH frequency and distribution. These results indicate selection has increased overall autozygosity across the genome, whereas the autozygosity in an unselected line has not changed significantly across most of the chromosomes. In addition, ROH distribution was more variable across the genomes of selected animals in comparison to a more even ROH distribution for unselected animals. Further analysis of genome-wide autozygosity changes and the association between traits and haplotypes identified more than 40 genomic regions under selection on several chromosomes (Chr) including Chr 2, 7, 16 and 20. Many of these selection signatures corresponded to quantitative trait loci for milk, fat, and protein yield previously found in contemporary Holsteins.
Runs of homozygosity (ROH), regions of the genome containing many consecutive homozygous SNPs, may represent two copies of a haplotype inherited from a common ancestor. A rare variant on this haplotype could thus be present in a homozygous and potentially recessive state. To detect rare risk variants for schizophrenia, we performed an ROH analysis in a homogeneous Irish genome wide association study (GWAS) dataset consisting of 1606 cases and 1794 controls. There was no genome-wide excess of ROH in cases compared to controls (p = 0.7986). No consensus ROH at individual loci showed association with schizophrenia after genome-wide correction.
Runs of homozygosity; GWAS; Schizophrenia; Rare variant; Mutation