Runs of homozygosity (ROH) are extended tracts of adjacent homozygous single nucleotide polymorphisms (SNPs) that are more common in unrelated individuals than previously thought. It has been proposed that estimating ROH on a genome-wide level, by making use of the genome-wide single nucleotide polymorphism (SNP) data, will enable to indentify recessive variants underlying complex traits. Here, we examined ROH larger than 1.5 Mb individually and in combination for association with survival in 5974 participants of the Rotterdam Study. In addition, we assessed the role of overall homozygosity, expressed as a percentage of the autosomal genome that is in ROH longer than 1.5 Mb, on survival during a mean follow-up period of 12 years. None of these measures of homozygosity was associated with survival to old age.
Runs of homozygosity (ROH) represents extended length of homozygotes on a long genomic distance. In oncology, it is known as loss of heterozygosity (LOH) if identified exclusively in cancer cell rather than in matched control cell. Studies have identified several genomic regions which show consistent ROH in different kinds of carcinoma. To query whether this consistency can be observed on broader spectrum, both in more cancer types and in wider genomic regions, we investigated ROH patterns in the National Cancer Institute 60 cancer cell line panel (NCI-60) and HapMap Caucasian healthy trio families. Using results from Affymetrix 500 K SNP arrays, we report a genome wide significant association of ROH regions between the NCI-60 and HapMap samples, with much a higher level of ROH (11 fold) in the cancer cell lines. Analysis shows that more severe ROH found in cancer cells appears to be the extension of existing ROH in healthy state. In the HapMap trios, the adult subgroup had a slightly but significantly higher level (1.02 fold) of ROH than did the young subgroup. For several ROH regions we observed the co-occurrence of fragile sites (FRAs). However, FRA on the genome wide level does not show a clear relationship with ROH regions.
Runs of homozygosity (ROH) may play a role in complex diseases. In the current study, we aimed to test if ROHs are linked to the risk of autism and related language impairment. We analyzed 546,080 SNPs in 315 Han Chinese affected with autism and 1,115 controls. ROH was defined as an extended homozygous haplotype spanning at least 500 kb. Relative extended haplotype homozygosity (REHH) for the trait-associated ROH region was calculated to search for the signature of selection sweeps. Totally, we identified 676 ROH regions. An ROH region on 11q22.3 was significantly associated with speech delay (corrected p = 1.73×10−8). This region contains the NPAT and ATM genes associated with ataxia telangiectasia characterized by language impairment; the CUL5 (culin 5) gene in the same region may modulate the neuronal migration process related to language functions. These three genes are highly expressed in the cerebellum. No evidence for recent positive selection was detected on the core haplotypes in this region. The same ROH region was also nominally significantly associated with speech delay in another independent sample (p = 0.037; combinatorial analysis Stouffer’s z trend = 0.0005). Taken together, our findings suggest that extended recessive loci on 11q22.3 may play a role in language impairment in autism. More research is warranted to investigate if these genes influence speech pathology by perturbing cerebellar functions.
Runs of homozygosity (ROHs) are a class of important but poorly studied genomic variations and may be involved in individual susceptibility to diseases. To better understand ROH and its relationship with lung cancer, we performed a genome-wide ROH analysis of a subset of a previous genome-wide case-control study (1,473 cases and 1,962 controls) in a Han Chinese population. ROHs were classified into two classes, based on lengths, intermediate and long ROHs, to evaluate their association with lung cancer risk using existing genome-wide single nucleotide polymorphism (SNP) data. We found that the overall level of intermediate ROHs was significantly associated with a decreased risk of lung cancer (odds ratio = 0.63; 95% confidence interval: 0.51-0.77; P = 4.78×10−6 ), while the long ROHs seemed to be a risk factor of lung cancer. We also identified one ROH region at 14q23.1 that was consistently associated with lung cancer risk in the study. These results indicated that ROHs may be a new class of variation which may be associated with lung cancer risk, and genetic variants at 14q23.1 may be involved in the development of lung cancer.
lung cancer; runs of homozygosity (ROHs); genome-wide association study
Regions of restricted genetic heterogeneity due to identity by descent (autozygosity) are known to confer susceptibility to a number of diseases. Regions of germline homozygosity (ROHs) of 1–2 Mb, the result of autozygosity, are detectable at high frequency in outbred populations. Recent studies have reported that ROHs, possibly through exposing recessive disease-causing alleles or alternative mechanisms, are associated with an increased cancer risk. To examine whether homozygosity is associated with breast or prostate cancer risk, we analysed 500K single-nucleotide polymorphism data from two genome-wide association studies conducted by the Cancer Genetics Markers of Susceptibility initiatives (http://cgems.cancer.gov/). Six common ROHs were associated with breast cancer risk and four with prostate cancer (P<0.01). Intriguingly, one of the breast cancer ROHs maps to 6q22.31–6q22.3, a region that has been previously shown to confer breast cancer risk. Although none of the ROHs remained significantly associated with cancer risk after adjustment for multiple testing, a number of ROHs merit further interrogation. However, our findings provide no strong evidence that levels of measured homozygosity, whatever their aetiology (autozygosity, uniparental isodisomy or hemizygosity), confer an increased risk of developing breast or prostate cancer in predominantly outbred populations.
homozygosity; risk; prostate; breast; cancer
The recent development of high-resolution DNA microarrays, in which hundreds of thousands of single nucleotide polymorphisms (SNPs) are genotyped, enables the rapid identification of susceptibility genes for complex diseases. Clusters of these SNPs may show runs of homozygosity (ROHs) that can be analyzed for association with disease. An analysis of patients whose parents were first cousins enables the search for autozygous segments in their offspring. Here, using the Affymetrix® Genome-Wide Human SNP Array 5.0 to determine ROHs, we genotyped 9 individuals with schizophrenia (SCZ) whose parents were first cousins. We identified overlapping ROHs on chromosomes 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 19, 20, and 21 in at least 3 individuals. Only the locus on chromosome 5 has been reported previously. The ROHs on chromosome 5q23.3–q31.1 include the candidate genes histidine triad nucleotide binding protein 1 (HINT1) and acyl-CoA synthetase long-chain family member 6 (ACSL6). Other overlapping ROHs may contain novel rare recessive variants that affect SCZ specifically in our samples, given the highly heterozygous nature of SCZ. Analysis of patients whose parents are first cousins may provide new insights for the genetic analysis of psychiatric diseases.
A central aim for studying runs of homozygosity (ROHs) in genome-wide SNP data is to detect the effects of autozygosity (stretches of the two homologous chromosomes within the same individual that are identical by descent) on phenotypes. However, it is unknown which current ROH detection program, and which set of parameters within a given program, is optimal for differentiating ROHs that are truly autozygous from ROHs that are homozygous at the marker level but vary at unmeasured variants between the markers.
We simulated 120 Mb of sequence data in order to know the true state of autozygosity. We then extracted common variants from this sequence to mimic the properties of SNP platforms and performed ROH analyses using three popular ROH detection programs, PLINK, GERMLINE, and BEAGLE. We varied detection thresholds for each program (e.g., prior probabilities, lengths of ROHs) to understand their effects on detecting known autozygosity.
Within the optimal thresholds for each program, PLINK outperformed GERMLINE and BEAGLE in detecting autozygosity from distant common ancestors. PLINK's sliding window algorithm worked best when using SNP data pruned for linkage disequilibrium (LD).
Our results provide both general and specific recommendations for maximizing autozygosity detection in genome-wide SNP data, and should apply equally well to research on whole-genome autozygosity burden or to research on whether specific autozygous regions are predictive using association mapping methods.
Runs of homozygosity (ROH) are contiguous lengths of homozygous genotypes that are present in an individual due to parents transmitting identical haplotypes to their offspring. The extent and frequency of ROHs may inform on the ancestry of an individual and its population. Here we use high density (n = 777,962) bi-allelic SNPs in a range of cattle breed samples to correlate ROH with the pedigree-based inbreeding coefficients and to validate subsequent analyses using 54,001 SNP genotypes. This study provides a first testing of the inference drawn from ROH through comparison with estimates of inbreeding from calculations based on the detailed pedigree data available for several breeds.
All animals genotyped on the HD panel displayed at least one ROH that was between 1–5 Mb in length with certain regions of the genome more likely to be involved in a ROH than others. Strong correlations (r = 0.75, p < 0.0001) existed between the pedigree-based inbreeding coefficient and a statistic based on sum of ROH of length > 0.5 KB and suggests that in the absence of an animal’s pedigree data, the extent of a genome under ROH may be used to infer aspects of recent population history even from relatively few samples.
Our findings suggest that ROH are frequent across all breeds but differing patterns of ROH length and burden illustrate variations in breed origins and recent management.
Runs of homozygosity; Inbreeding; Cattle population history
Parkinson's disease (PD) occurs in both familial and sporadic forms, and both monogenic and complex genetic factors have been identified. Early onset PD (EOPD) is particularly associated with autosomal recessive (AR) mutations, and three genes, PARK2, PARK7 and PINK1, have been found to carry mutations leading to AR disease. Since mutations in these genes account for less than 10% of EOPD patients, we hypothesized that further recessive genetic factors are involved in this disorder, which may appear in extended runs of homozygosity.
We carried out genome wide SNP genotyping to look for extended runs of homozygosity (ROHs) in 1,445 EOPD cases and 6,987 controls. Logistic regression analyses showed an increased level of genomic homozygosity in EOPD cases compared to controls. These differences are larger for ROH of 9 Mb and above, where there is a more than three-fold increase in the proportion of cases carrying a ROH. These differences are not explained by occult recessive mutations at existing loci. Controlling for genome wide homozygosity in logistic regression analyses increased the differences between cases and controls, indicating that in EOPD cases ROHs do not simply relate to genome wide measures of inbreeding. Homozygosity at a locus on chromosome19p13.3 was identified as being more common in EOPD cases as compared to controls. Sequencing analysis of genes and predicted transcripts within this locus failed to identify a novel mutation causing EOPD in our cohort.
There is an increased rate of genome wide homozygosity in EOPD, as measured by an increase in ROHs. These ROHs are a signature of inbreeding and do not necessarily harbour disease-causing genetic variants. Although there might be other regions of interest apart from chromosome 19p13.3, we lack the power to detect them with this analysis.
The intensive selection programs for milk made possible by mass artificial insemination increased the similarity among the genomes of North American (NA) Holsteins tremendously since the 1960s. This migration of elite alleles has caused certain regions of the genome to have runs of homozygosity (ROH) occasionally spanning millions of continuous base pairs at a specific locus. In this study, genome signatures of artificial selection in NA Holsteins born between 1953 and 2008 were identified by comparing changes in ROH between three distinct groups under different selective pressure for milk production. The ROH regions were also used to estimate the inbreeding coefficients. The comparisons of genomic autozygosity between groups selected or unselected since 1964 for milk production revealed significant differences with respect to overall ROH frequency and distribution. These results indicate selection has increased overall autozygosity across the genome, whereas the autozygosity in an unselected line has not changed significantly across most of the chromosomes. In addition, ROH distribution was more variable across the genomes of selected animals in comparison to a more even ROH distribution for unselected animals. Further analysis of genome-wide autozygosity changes and the association between traits and haplotypes identified more than 40 genomic regions under selection on several chromosomes (Chr) including Chr 2, 7, 16 and 20. Many of these selection signatures corresponded to quantitative trait loci for milk, fat, and protein yield previously found in contemporary Holsteins.
The human genome is characterised by many runs of homozygous genotypes, where identical haplotypes were inherited from each parent. The length of each run is determined partly by the number of generations since the common ancestor: offspring of cousin marriages have long runs of homozygosity (ROH), while the numerous shorter tracts relate to shared ancestry tens and hundreds of generations ago. Human populations have experienced a wide range of demographic histories and hold diverse cultural attitudes to consanguinity. In a global population dataset, genome-wide analysis of long and shorter ROH allows categorisation of the mainly indigenous populations sampled here into four major groups in which the majority of the population are inferred to have: (a) recent parental relatedness (south and west Asians); (b) shared parental ancestry arising hundreds to thousands of years ago through long term isolation and restricted effective population size (Ne), but little recent inbreeding (Oceanians); (c) both ancient and recent parental relatedness (Native Americans); and (d) only the background level of shared ancestry relating to continental Ne (predominantly urban Europeans and East Asians; lowest of all in sub-Saharan African agriculturalists), and the occasional cryptically inbred individual. Moreover, individuals can be positioned along axes representing this demographic historic space. Long runs of homozygosity are therefore a globally widespread and under-appreciated characteristic of our genomes, which record past consanguinity and population isolation and provide a distinctive record of the demographic history of an individual's ancestors. Individual ROH measures will also allow quantification of the disease risk arising from polygenic recessive effects.
Inbreeding has long been recognized as a primary cause of fitness reduction in both wild and domesticated populations. Consanguineous matings cause inheritance of haplotypes that are identical by descent (IBD) and result in homozygous stretches along the genome of the offspring. Size and position of regions of homozygosity (ROHs) are expected to correlate with genomic features such as GC content and recombination rate, but also direction of selection. Thus, ROHs should be non-randomly distributed across the genome. Therefore, demographic history may not fully predict the effects of inbreeding. The porcine genome has a relatively heterogeneous distribution of recombination rate, making Sus scrofa an excellent model to study the influence of both recombination landscape and demography on genomic variation. This study utilizes next-generation sequencing data for the analysis of genomic ROH patterns, using a comparative sliding window approach. We present an in-depth study of genomic variation based on three different parameters: nucleotide diversity outside ROHs, the number of ROHs in the genome, and the average ROH size. We identified an abundance of ROHs in all genomes of multiple pigs from commercial breeds and wild populations from Eurasia. Size and number of ROHs are in agreement with known demography of the populations, with population bottlenecks highly increasing ROH occurrence. Nucleotide diversity outside ROHs is high in populations derived from a large ancient population, regardless of current population size. In addition, we show an unequal genomic ROH distribution, with strong correlations of ROH size and abundance with recombination rate and GC content. Global gene content does not correlate with ROH frequency, but some ROH hotspots do contain positive selected genes in commercial lines and wild populations. This study highlights the importance of the influence of demography and recombination on homozygosity in the genome to understand the effects of inbreeding.
Small populations have an increased risk of inbreeding depression due to a higher expression of deleterious alleles. This can have major consequences for the viability of these populations. In domesticated species like the pig that are artificially selected in breeding populations, but also in wild populations that experience habitat decline, maintaining genetic diversity is essential. Recent advances in sequence technology enabled us to identify patterns of nucleotide variation in individual genomes. We screened the full genome of wild boars and commercial pigs from Eurasia for regions of homozygosity. We found these regions of homozygosity were caused by the demographic history and effective population size of the pigs. European wild boars are least variable, but also European breeds contain large homozygous stretches in their genome. Moreover, the likelihood of a region becoming depleted depends on its position in the genome, because variation has a high correlation with recombination rate. The telomeric regions are much more variable, and the central region of chromosomes has a higher chance of containing long regions of homozygosity. These findings increase knowledge on the fine-scaled architecture of genomic variation, and they are particularly important for population genetic management.
The peculiar position of Sardinia in the Mediterranean sea has rendered its population an interesting biogeographical isolate. The aim of this study was to investigate the genetic population structure, as well as to estimate Runs of Homozygosity and regions under positive selection, using about 1.2 million single nucleotide polymorphisms genotyped in 1077 Sardinian individuals. Using four different methods - fixation index, inflation factor, principal component analysis and ancestry estimation - we were able to highlight, as expected for a genetic isolate, the high internal homogeneity of the island. Sardinians showed a higher percentage of genome covered by RoHs>0.5 Mb (FRoH%0.5) when compared to peninsular Italians, with the only exception of the area surrounding Alghero. We furthermore identified 9 genomic regions showing signs of positive selection and, we re-captured many previously inferred signals. Other regions harbor novel candidate genes for positive selection, like TMEM252, or regions containing long non coding RNA. With the present study we confirmed the high genetic homogeneity of Sardinia that may be explained by the shared ancestry combined with the action of evolutionary forces.
Genome-wide studies on autism spectrum disorders (ASDs) have mostly focused on large-scale population samples, but examination of rare variations in isolated populations may provide additional insights into the disease pathogenesis.
As a first step in the genetic analysis of ASD in Croatia, we characterized genetic variation in a sample of 103 subjects with ASD and 203 control individuals, who were genotyped using the Illumina HumanHap550 BeadChip. We analyzed the genetic diversity of the Croatian population and its relationship to other populations, the degree of relatedness via Runs of Homozygosity (ROHs), and the distribution of large (>500 Kb) copy number variations.
Combining the Croatian cohort with several previously published populations in the FastME analysis (an alternative to Neighbor Joining) revealed that Croatian subjects cluster, as expected, with Southern Europeans; in addition, individuals from the same geographic region within Europe cluster together. Whereas Croatian subjects could be separated from a sample of healthy control subjects of European origin from North America, Croatian ASD cases and controls are well mixed. A comparison of runs of homozygosity indicated that the number and the median length of regions of homozygosity are higher for ASD subjects than for controls (p = 6 × 10-3). Furthermore, analysis of copy number variants found a higher frequency of large chromosomal rearrangements (>2 Mb) in ASD cases (5/103) than in ethnically matched control subjects (1/197, p = 0.019).
Our findings illustrate the remarkable utility of high-density genotype data for subjects from a limited geographic area in dissecting genetic heterogeneity with respect to population and disease related variation.
Unlike genome-wide association studies, few comprehensive studies of copy number variation's contribution to complex human disease susceptibility have been performed. Copy number variations are abundant in humans and represent one of the least well-studied classes of genetic variants; in addition, known rheumatoid arthritis susceptibility loci explain only a portion of familial clustering. Therefore, we performed a genome-wide study of association between deletion or excess homozygosity and rheumatoid arthritis using high-density 550 K SNP genotype data from a genome-wide association study. We used a genome-wide statistical method that we recently developed to test each contiguous SNP locus between 868 cases and 1194 controls to detect excess homozygosity or deletion variants that influence susceptibility. Our method is designed to detect statistically significant evidence of deletions or homozygosity at individual SNPs for SNP-by-SNP analyses and to combine the information among neighboring SNPs for cluster analyses. In addition to successfully detecting the known deletion variants on major histocompatibility complex, we identified 4.3 and 28 kb clusters on chromosomes 10p and 13q, respectively, which were significant at a Bonferroni-type-corrected 0.05 nominal significant level. Independently, we performed analyses using PennCNV, an algorithm for identifying and cataloging copy numbers for individuals based on a hidden Markov model, and identified cases and controls that had chromosomal segments with copy number <2. Using Fisher's exact test for comparing the numbers of cases and controls with copy number <2 per SNP, we identified 26 significant SNPs (protective; more controls than cases) aggregating on chromosome 14 with P-values <10−8.
Seropositive rheumatoid arthritis is genetically linked to a group of HLA-DRB1 alleles sharing a sequence motif within the third hypervariable region. Controversy exists over the role of the distinct allelic variants in affecting not only the risk to develop disease, but also in modifying the expression of the disease. We have stratified 81 patients according to their patterns of disease manifestations and identified the HLA-DRB1 alleles by polymerase chain reaction amplification and subsequent oligonucleotide hybridization. To identify precisely the allelic combinations at the HLA-DRB1 locus, homozygosity was confirmed by locus-specific cDNA amplification and subsequent sequencing. Our study demonstrated a high correlation of allelic combinations of disease-associated HLA-DRB1 alleles with the clinical manifestations. Characteristic genotypes were identified for patients who had progressed toward nodular disease and patients who had developed major organ involvement. Rheumatoid nodules were highly associated with a heterozygosity for two disease associated HLA-DRB1 alleles. Homozygosity for the HLA-DRB1*0401 allele was a characteristic finding for RA patients with major organ involvement. Our data suggest a role of the disease-associated sequence motif in determining severity of the disease. The finding of a codominant function of HLA-DRB1 alleles suggests that the biological function of HLA-DR molecules in thymic selection might be important in the pathogenesis of RA.
We performed a whole-genome association study of rheumatoid arthritis susceptibility using Illumina 550k single-nucleotide polymorphism (SNP) genotypes of 868 cases and 1194 controls from the North American Rheumatoid Arthritis Consortium (NARAC). Structured association analysis with adjustment for potential population stratification yielded 200 SNPs with p < 1 × 10-8 for association with RA, all of which were on chromosome 6 in a 2.7-Mb region of the major histocompatibility complex (MHC). Given the extensive linkage equilibrium in the region and known risk of HLA-DRB1 alleles, we then applied conditional analyses to ascertain independent signals for RA susceptibility among these 200 candidate SNPs. Conditional analyses incorporating risk categories of the HLA-DRB1 "shared epitope" revealed three SNPs having independent associations with RA (conditional p < 0.001). This supports the presence of significant effects on RA susceptibility in the MHC in addition to the shared epitope.
With the exception of the major histocompatibility complex (MHC) and STAT4, no other rheumatoid arthritis (RA) linkage peak has been successfully fine-mapped to date. This apparent failure to identify association under peaks of linkage could be ascribed to the examination of common variation, when linkage is likely to be driven by rare variants. The purpose of this study was to investigate the overlap between genome-wide rare variant RA association signals observed in the Wellcome Trust Case Control Consortium (WTCCC) study and 11 replicating RA linkage peaks, defined as regions with evidence for linkage in >1 study.
The WTCCC data set contained 40,482 variants with minor allele frequency of ≤0.05 in 1,860 RA patients and 2,938 controls. Genotypes of all rare variants within a given gene region were collapsed into a single locus and a global P value was calculated per gene.
The distribution of rare variant signals (association P ≤ 10−5) was found to differ significantly between regions with and without linkage evidence (P = 2 × 10−17 by Fisher’s exact test). No significant difference was observed after data from the MHC region were removed or when the effect of the HLA–DRB1 locus was accounted for.
The results suggest that rare variant association signals are significantly overrepresented under linkage peaks in RA, but the effect is driven by the MHC. This is the first study to examine the overlap between linkage peaks and rare variant association signals genome-wide in a complex disease.
This research investigates the influence of demographic factors on human genetic sub-structure. In our discovery cohort, we show significant demographic trends for decreasing autozygosity associated with population variation in chronological age. Autozygosity, the genomic signature of consanguinity, is identifiable on a genome-wide level as extended tracts of homozygosity. We identified an average of 28.6 tracts of extended homozygosity greater than 1 Mb in length in a representative population of 809 unrelated North Americans of European descent ranging in chronological age from 19–99 years old. These homozygous tracts made up a population average of 42 Mb of the genome corresponding to 1.6% of the entire genome, with each homozygous tract an average of 1.5 Mb in length. Runs of homozygosity are steadily decreasing in size and frequency as time progresses (linear regression, p<0.05). We also calculated inbreeding coefficients and showed a significant trend for population-wide increasing heterozygosity outside of linkage disequilibrium. We successfully replicated these associations in a demographically similar cohort comprised of a subgroup of 477 Baltimore Longitudinal Study of Aging participants. We also constructed statistical models showing predicted declining rates of autozygosity spanning the 20th century. These predictive models suggest a 14.0% decrease in the frequency of these runs of homozygosity and a 24.3% decrease in the percent of the genome in runs of homozygosity, as well as a 30.5% decrease in excess homozygosity based on the linkage pruned inbreeding coefficients. The trend for decreasing autozygosity due to panmixia and larger effective population sizes will likely affect the frequency of rare recessive genetic diseases in the future. Autozygosity has declined, and it seems it will continue doing so.
Population geneticists use genetic markers to quantify and compare levels of inbreeding in populations and identify disease-associated loci; epidemiologists utilize demographic factors to quantify disease risk modifiers. Our research group sought to investigate the intersection of these two disciplines and examine the way in which demographic trends associated with decreasing levels of inbreeding may influence genomic structure and how this may affect medical genetics research. By examining two age-heterogeneous populations of outbred North Americans, we were able to ascertain genetic changes occurring over the past century that have been likely brought about by recent increases in mobility, urbanization, and population admixture. Using multiple measures of the genomic manifestations of distant consanguinity, we showed significant trends towards decreasing levels of autozygosity and more marginal inbreeding coefficients as study participant birth years neared the chronological present day. We believe this finding is particularly important, as decreasing autozygosity and less homozygosity genome-wide may help to slightly reduce the burden of rare recessive diseases in the future.
Homozygosity mapping is a common method for mapping recessive traits in consanguineous families. In most studies, applications for multipoint linkage analyses are applied to determine the genomic region linked to the disease. Unfortunately, these are neither suited for very large families nor for the inclusion of tens of thousands of SNPs. Even if less than 10 000 markers are employed, such an analysis may easily last hours if not days. Here we present a web-based approach to homozygosity mapping. Our application stores marker data in a database into which users can directly upload their own SNP genotype files. Within a few minutes, the database analyses the data, detects homozygous stretches and provides an intuitive graphical interface to the results. The homozygosity in affected individuals is visualized genome-wide with the ability to zoom into single chromosomes and user-defined chromosomal regions. The software also displays the underlying genotypes in all samples. It is integrated with our candidate gene search engine, GeneDistiller, so that users can interactively determine the most promising gene. They can at any point restrict access to their data or make it public, allowing HomozygosityMapper to be used as a data repository for homozygosity-mapping studies. HomozygosityMapper is available at http://www.homozygositymapper.org/.
Identification of disease variants via homozygosity mapping and investigation of the effects of genome-wide homozygosity regions on traits of biomedical importance have been widely applied recently. Nonetheless, the existing methods and algorithms to identify long tracts of homozygosity (TOH) are not able to provide efficient and rigorous regions for further downstream association investigation. We expanded current methods to identify TOHs by defining “surrogate-TOH”, a region covering a cluster of TOHs with specific characteristics. Our defined surrogate-TOH includes cTOH, viz a common TOH region where at least ten TOHs present; gTOH, whereby a group of highly overlapping TOHs share proximal boundaries; and aTOH, which are allelically-matched TOHs. Searching for gTOH and aTOH was based on a repeated binary spectral clustering algorithm, where a hierarchy of clusters is created and represented by a TOH cluster tree. Based on the proposed method of identifying different species of surrogate-TOH, our cgaTOH software was developed. The software provides an intuitive and interactive visualization tool for better investigation of the high-throughput output with special interactive navigation rings, which will find its applicability in both conventional association studies and more sophisticated downstream analyses. NCBI genome map viewer is incorporated into the system. Moreover, we discuss the choice of implementing appropriate empirical ranges of critical parameters by applying to disease models. This method identifies various patterned clusters of SNPs demonstrating extended homozygosity, thus one can observe different aspects of the multi-faceted characteristics of TOHs.
Traditional genome-wide association studies (GWAS) of large cohort of subjects with chronic obstructive pulmonary disease (COPD) have successfully identified novel candidate genes, but several other plausible loci do not meet strict criteria for genome-wide significance after correction for multiple testing.
We hypothesize that by applying unbiased weights derived from unique populations we can identify additional COPD susceptibility loci.
We performed a homozygosity haplotype analysis on a group of subjects with and without COPD to identify regions of conserved homozygosity (RCHH). Weights were constructed based on the frequency of these RCHH in case vs. controls, and used to adjust the P values from a large collaborative GWAS of COPD.
We identified 2,318 regions of conserved homozygosity, of which 576 were significantly (P < .05) overrepresented in cases. After applying the weights constructed from these regions to a collaborative GWAS of COPD, we identified two single nucleotide polymorphisms in a novel gene (FGF7) that gained genome-wide significance by the false discovery rate method. In a follow-up analysis, both SNPs (rs12591300 and rs4480740) were significantly associated with COPD in an independent population (combined P values of 7.9E-07 and 2.8E-06 respectively). In another independent population, increased lung tissue FGF7 expression was associated with worse measures of lung function.
Weights constructed from a homozygosity haplotype analysis of an isolated population successfully identify novel genetic associations from a GWAS on a separate population. This method can be used to identify promising candidate genes that fail to meet strict correction for multiple testing.
Previous investigations have reported linkage disequilibrium occurring between nearby polymorphisms, a block-like structure for such relationships, some instances where surprisingly few haplotypes are found and regions of extended homozygosity which are especially marked around centromeres and which are especially common on the X chromosome. We investigated the distribution and nature of regions of extended homozygosity in a sample of 1411 subjects included in a genome wide association study. Regions of extended homozygosity over 1Mb are common, with an average of 35.9 occurring per subject, and containing on average 73 homozygous markers. They have a markedly non-random distribution. They are relatively common on the X chromosome and are seen at centromeres but are also concentrated at other chromosomal regions where presumably recombination is rare. They seem to be a consequence of some haplotypes being very common in the population and although sometimes this reflects the effect of a very common haplotype we also note that there are examples of two or three common haplotypes, each very different from each other, underlying this effect. Regions of extended homozygosity are commoner than previously appreciated. They result from the presence of extended haplotypes with high population frequency. Such regions concentrate in particular locations. The haplotypes involved are sometimes markedly disparate from each other. These regions offer a valuable opportunity for further investigation, in particular with regard to their ancestral history.
Homozygosity; extended haplotypes
The genetic association of the major histocompatibility complex (MHC) to rheumatoid arthritis risk has commonly been attributed to HLA-DRB1 alleles. Yet controversy persists about the causal variants in HLA-DRB1 and the presence of independent effects elsewhere in the MHC. Using existing genome-wide SNP data in 5,018 seropositive cases and 14,974 controls, we imputed and tested classical alleles and amino acid polymorphisms for HLA-A, B, C, DPA1, DPB1, DQA1, DQB1, and DRB1 along with 3,117 SNPs across the MHC. Conditional and haplotype analyses reveal that three amino acid positions (11, 71 and 74) in HLA-DRβ1, and single amino acid polymorphisms in HLA-B (position 9) and HLA-DPβ1 (position 9), all located in the peptide-binding grooves, almost completely explain the MHC association to disease risk. This study illustrates how imputation of functional variation from large reference panels can help fine-map association signals in the MHC.
We report on the validation and implementation of the HumanCytoSNP-12 array (Illumina) (HCS) in prenatal diagnosis. In total, 64 samples were used to validate the Illumina platform (20 with a known (sub) microscopic chromosome abnormality, 5 with known maternal cell contamination (MCC) and 39 normal control samples). There were no false-positive or false-negative results. In addition to the diagnostic possibilities of arrayCGH, the HCS allows detection of regions of homozygosity (ROH), triploidy and helps recognising MCC. Moreover, in two cases of MCC, a deletion was correctly detected. Furthermore we found out that only about 50 ng of DNA is required, which allows a reporting time of only 3 days. We also present a prospective pilot study of 61 fetuses with ultrasound abnormalities and a normal karyotype tested with HCS. In 4 out of 61 (6.5%) fetuses, a clinically relevant abnormality was detected. We designed and present pre-test genetic counselling information on categories of possible test outcomes. On the basis of this information, about 90% of the parents chose to be informed about adverse health outcomes of their future child at infancy and childhood, and 55% also about outcomes at an adult stage. The latter issue regarding the right of the future child itself to decide whether or not to know this information needs to be addressed.
genomic SNP array; rapid genomic array testing; whole-genome screening; pre-test genetic counselling; prenatal diagnosis