Genetic heterogeneity, which may manifest on a population level as different frequencies of a specific disease susceptibility allele in different subsets of patients, is a common problem for candidate gene and genome-wide association studies of complex human diseases. The ordered subset analysis (OSA) was originally developed as a method to reduce genetic heterogeneity in the context of family-based linkage studies. Here, we have extended a previously proposed method (OSACC) for applying the OSA methodology to case-control datasets. We have evaluated the type I error and power of different OSACC permutation tests with an extensive simulation study. Case-control datasets were generated under two different models by which continuous clinical or environmental covariates may influence the relationship between susceptibility genotypes and disease risk. Our results demonstrate that OSACC is more powerful under some disease models than the commonly used trend test and a previously proposed joint test of main genetic and gene-environment interaction effects. An additional unique benefit of OSACC is its ability to identify a more informative subset of cases that may be subjected to more detailed molecular analysis, such as DNA sequencing of selected genomic regions to detect functional variants in linkage disequilibrium with the associated polymorphism. The OSACC-identified covariate threshold may also improve the power of an additional dataset to replicate previously reported associations that may only be detectable in a fraction of the original and replication datasets. In summary, we have demonstrated that OSACC is a useful method for improving SNP association signals in genetically heterogeneous datasets.
genetic heterogeneity; association analysis; sequencing study design; permutation test; SIMLA
To test whether case-control-based familial aggregation studies produce estimates of risk to relatives that are inherently biased or confounded by age and family size, and to compare case-control-derived estimates with those from the reconstructed cohort method. In addition, we test if the definition of family history affects the accuracy of results obtained from either design. We use simulated data, which allows us to know the true data origin.
We simulated populations of three generation families. Both a dominant genetic disease and a non-genetic disease were present in the population. We compared the effect estimates from different measures of family history with those derived from the actual genetic cause of disease.
Effect estimates from family history measures that used multiple family members were more accurate than those derived from measures based on a single relative. Neither family size nor age of family members defining family history were confounders in the case-control design.
The case-control and reconstructed cohort designs are equally valid in assessing familial aggregation of disease.
Epidemiology; Family history; Computer simulation; Sensitivity; Specificity; Confounding
Genetic risk models could potentially be useful in identifying high-risk groups for the prevention of complex diseases. We investigated the performance of this risk stratification strategy by examining epidemiological parameters that impact the predictive ability of risk models.
We assessed sensitivity, specificity, and positive and negative predictive value for all possible risk thresholds that can define high-risk groups and investigated how these measures depend on the frequency of disease in the population, the frequency of the high-risk group, and the discriminative accuracy of the risk model, as assessed by the area under the receiver-operating characteristic curve (AUC). In a simulation study, we modeled genetic risk scores of 50 genes with equal odds ratios and genotype frequencies, and varied the odds ratios and the disease frequency across scenarios. We also performed a simulation of age-related macular degeneration risk prediction based on published odds ratios and frequencies for six genetic risk variants.
We show that when the frequency of the high-risk group was lower than the disease frequency, positive predictive value increased with the AUC but sensitivity remained low. When the frequency of the high-risk group was higher than the disease frequency, sensitivity was high but positive predictive value remained low. When both frequencies were equal, both positive predictive value and sensitivity increased with increasing AUC, but higher AUC was needed to maximize both measures.
The performance of risk stratification is strongly determined by the frequency of the high-risk group relative to the frequency of disease in the population. The identification of high-risk groups with appreciable combinations of sensitivity and positive predictive value requires higher AUC.
Risk assessment constitutes an essential component of genetic counseling and testing, and the genetic risk should be estimated as accurately as possible for individual and family decision making. All relevant information retrieved from population studies and pedigree and genetic testing enhances the accuracy of the assessment of an individual's genetic risk. This review will focus on the following general aspects implicated in risk assessment: the increasing genetic information regarding disease; complex traits versus Mendelian disorders; and the influence of the environment and disease susceptibility. The influence of these factors on risk assessment will be discussed.
genetics; risk assessment; genetic testing; genetic risk; genetic susceptibility
Cleft lip or palate (CL/P) is a common facial defect present in 1 : 700 live births and results in substantial burden to patients. There are more than 500 CL/P syndromes described, the causes of which may be single-gene mutations, chromosomopathies, and exposure to teratogens. Part of the most prevalent syndromic CL/P has known etiology. Nonsyndromic CL/P, on the other hand, is a complex disorder, whose etiology is still poorly understood. Recent genome-wide association studies have contributed to the elucidation of the genetic causes, by raising reproducible susceptibility genetic variants; their etiopathogenic roles, however, are difficult to predict, as in the case of the chromosomal region 8q24, the most corroborated locus predisposing to nonsyndromic CL/P. Knowing the genetic causes of CL/P will directly impact the genetic counseling, by estimating precise recurrence risks, and the patient management, since the patient, followup may be partially influenced by their genetic background. This paper focuses on the genetic causes of important syndromic CL/P forms (van der Woude syndrome, 22q11 deletion syndrome, and Robin sequence-associated syndromes) and depicts the recent findings in nonsyndromic CL/P research, addressing issues in the conduct of the geneticist.
It is well known that the presence of population stratification (PS) may cause the usual test in case-control studies to produce spurious gene-disease associations. However, the impact of the PS and sample selection (SS) is less known. In this paper, we provide a systematic study of the joint effect of PS and SS under a more general risk model containing genetic and environmental factors. We provide simulation results to show the magnitude of the bias and its impact on type I error rate of the usual chi-square test under a wide range of PS level and selection bias.
The biases to the estimation of main and interaction effect are quantified and then their bounds derived. The estimated bounds can be used to compute conservative p-values for the association test. If the conservative p-value is smaller than the significance level, we can safely claim that the association test is significant regardless of the presence of PS or not, or if there is any selection bias. We also identify conditions for the null bias. The bias depends on the allele frequencies, exposure rates, gene-environment odds ratios and disease risks across subpopulations and the sampling of the cases and controls.
Our results show that the bias cannot be ignored even the case and control data were matched in ethnicity. A real example is given to illustrate application of the conservative p-value. These results are useful to the genetic association studies of main and interaction effects.
The risk of breast cancer to first degree relatives of breast cancer patients is approximately twice that of the general population. Breast cancer, however, is a heterogeneous disease and it is plausible that the familial relative risk (FRR) for breast cancer may differ by the pathological subtype of the tumour. The contribution of genetic variants associated with breast cancer susceptibility to the subtype-specific FRR is still unclear.
We computed breast cancer FRR for subtypes of breast cancer by comparing breast cancer incidence in relatives of breast cancer cases from a population-based series with known estrogen receptor (ER), progesterone receptor (PR) or human epidermal growth factor receptor 2 (HER2) status with that expected from the general population. We estimated the contribution to the FRR of genetic variants associated with breast cancer susceptibility using subtype-specific genotypic relative risks and allele frequencies for each variant.
At least one marker was measured for 4,590 breast cancer cases, who reported 9,014 affected and unaffected first-degree female relatives. There was no difference between the breast cancer FRR for relatives of patients with ER-negative (FRR = 1.78, 95% confidence intervals (CI): 1.44 to 2.11) and ER-positive disease (1.82, 95% CI: 1.67 to 1.98), P = 0.99. There was some suggestion that the breast cancer FRR for relatives of patients with ER-negative disease was higher than that for ER-positive disease for ages of the relative less than 50 years old (FRR = 2.96, 95% CI: 2.04 to 3.87; and 2.05, 95% CI: 1.70 to 2.40 respectively; P = 0.07), and that the breast cancer FRR for relatives of patients with ER-positive disease was higher than for ER-negative disease when the age of the relative was greater than 50 years (FRR = 1.76, 95% CI: 1.59 to 1.93; and 1.41, 95% CI: 1.08 to 1.74 respectively, P = 0.06). We estimated that mutations in BRCA1 and BRCA2 explain 32% of breast cancer FRR for relatives of patients with ER-negative and 9.4% of the breast cancer FRR for relatives of patients with ER-positive disease. Twelve recently identified common breast cancer susceptibility variants were estimated to explain 1.9% and 9.6% of the FRR to relatives of patients with ER-negative and ER-positive disease respectively.
FRR for breast cancer was significantly increased for both ER-negative and ER-positive disease. Including receptor status in conjunction with genetic status may aid risk prediction in women with a family history.
The combination of a sire model and a random regression term describing genotype by environment interactions may lead to biased estimates of genetic variance components because of heterogeneous residual variance. In order to test different models, simulated data with genotype by environment interactions, and dairy cattle data assumed to contain such interactions, were analyzed. Two animal models were compared to four sire models. Models differed in their ability to handle heterogeneous variance from different sources. Including an individual effect with a (co)variance matrix restricted to three times the sire (co)variance matrix permitted the modeling of the additive genetic variance not covered by the sire effect. This made the ability of sire models to handle heterogeneous genetic variance approximately equivalent to that of animal models. When residual variance was heterogeneous, a different approach to account for the heterogeneity of variance was needed, for example when using dairy cattle data in order to prevent overestimation of genetic heterogeneity of variance. Including environmental classes can be used to account for heterogeneous residual variance.
It has been estimated that between 5% and 10% of women diagnosed with breast cancer have a hereditary form of the disease, primarily caused by a BRCA1 or BRCA2 gene mutation. Such women have an increased risk of developing a new primary breast and/or ovarian tumor, and may therefore opt for preventive surgery (e.g., bilateral mastectomy, oophorectomy). It is common practice to offer high-risk patients genetic counseling and DNA testing after their primary treatment, with genetic test results being available within 4-6 months. However, some non-commercial laboratories can currently generate test results within 3 to 6 weeks, and thus make it possible to provide rapid genetic counseling and testing (RGCT) prior to primary treatment. The aim of this study is to determine the effect of RGCT on treatment decisions and on psychosocial health.
In this randomized controlled trial, 255 newly diagnosed breast cancer patients with at least a 10% risk of carrying a BRCA gene mutation are being recruited from 12 hospitals in the Netherlands. Participants are randomized in a 2:1 ratio to either a RGCT intervention group (the offer of RGCT directly following diagnosis with tests results available before surgical treatment) or to a usual care control group. The primary behavioral outcome is the uptake of direct bilateral mastectomy or delayed prophylactic contralateral mastectomy. Psychosocial outcomes include cancer risk perception, cancer-related worry and distress, health-related quality of life, decisional satisfaction and the perceived need for and use of additional decisional counseling and psychosocial support. Data are collected via medical chart audits and self-report questionnaires administered prior to randomization, and at 6 month and at 12 month follow-up.
This trial will provide essential information on the impact of RGCT on the choice of primary surgical treatment among women with breast cancer with an increased risk of hereditary cancer. This study will also provide data on the psychosocial consequences of RGCT and of risk-reducing behavior.
The study is registered at the Netherlands Trial Register (NTR1493) and ClinicalTrials.gov (NCT00783822).
The spatial modeling of infectious disease distributions and dynamics is increasingly being undertaken for health services planning and disease control monitoring, implementation, and evaluation. Where risks are heterogeneous in space or dependent on person-to-person transmission, spatial data on human population distributions are required to estimate infectious disease risks, burdens, and dynamics. Several different modeled human population distribution datasets are available and widely used, but the disparities among them and the implications for enumerating disease burdens and populations at risk have not been considered systematically. Here, we quantify some of these effects using global estimates of populations at risk (PAR) of P. falciparum malaria as an example.
The recent construction of a global map of P. falciparum malaria endemicity enabled the testing of different gridded population datasets for providing estimates of PAR by endemicity class. The estimated population numbers within each class were calculated for each country using four different global gridded human population datasets: GRUMP (~1 km spatial resolution), LandScan (~1 km), UNEP Global Population Databases (~5 km), and GPW3 (~5 km). More detailed assessments of PAR variation and accuracy were conducted for three African countries where census data were available at a higher administrative-unit level than used by any of the four gridded population datasets.
The estimates of PAR based on the datasets varied by more than 10 million people for some countries, even accounting for the fact that estimates of population totals made by different agencies are used to correct national totals in these datasets and can vary by more than 5% for many low-income countries. In many cases, these variations in PAR estimates comprised more than 10% of the total national population. The detailed country-level assessments suggested that none of the datasets was consistently more accurate than the others in estimating PAR. The sizes of such differences among modeled human populations were related to variations in the methods, input resolution, and date of the census data underlying each dataset. Data quality varied from country to country within the spatial population datasets.
Detailed, highly spatially resolved human population data are an essential resource for planning health service delivery for disease control, for the spatial modeling of epidemics, and for decision-making processes related to public health. However, our results highlight that for the low-income regions of the world where disease burden is greatest, existing datasets display substantial variations in estimated population distributions, resulting in uncertainty in disease assessments that utilize them. Increased efforts are required to gather contemporary and spatially detailed demographic data to reduce this uncertainty, particularly in Africa, and to develop population distribution modeling methods that match the rigor, sophistication, and ability to handle uncertainty of contemporary disease mapping and spread modeling. In the meantime, studies that utilize a particular spatial population dataset need to acknowledge the uncertainties inherent within them and consider how the methods and data that comprise each will affect conclusions.
In this retrospective study, we examined changes in decision-making for and against the predictive genetic test for Huntington's disease including 478 persons at risk who had undergone genetic counselling in one centre in Germany between 1993 and 2004. At the outset of the counselling procedure the majority of subjects (71%) wanted to make use of the test, yet the actual demand of the predictive test result declined from 67 to 38% over the years. In addition, the time interval between counselling session and blood withdrawal was reduced, as determined by the counselees: in 2000–2004 the majority of persons at risk made the appointment for blood withdrawal after the shortest possible time span. Demographic factors of the cohort remained comparatively stable in the investigated time period. An association was evident between the ratio of test usage and the counselling person. These and other possible factors influencing the time flow of predictive DNA testing are discussed. Further studies are necessary to investigate whether changes of test demand rates are a general phenomenon.
genetic testing; genetic counselling; Huntington's disease; predictive testing
The epidemiology of leprosy is characterized by heterogeneity in susceptibility and clustering of disease within households. We aim to assess the extent to which different mechanisms for heterogeneity in leprosy susceptibility can explain household clustering as observed in a large study among contacts of leprosy patients.
We used a microsimulation model, parameterizing it with data from over 20,000 contacts of leprosy patients in Bangladesh. We simulated six mechanisms producing heterogeneity in susceptibility: (1) susceptibility was allocated at random to persons (i.e. no additional mechanism), (2) a household factor, (3, 4) a genetic factor (dominant or recessive), or (5, 6) half a household factor and half genetic. We further assumed that a fraction of 5%, 10%, and 20% of the population was susceptible, leading to a total of 18 scenarios to be fitted to the data. We obtained an acceptable fit for each of the six mechanisms, thereby excluding none of the possible underlying mechanisms for heterogeneity of susceptibility to leprosy. However, the distribution of leprosy among contacts did differ between mechanisms, and predicted trends in the declining leprosy case detection were dependent on the assumed mechanism, with genetic-based susceptibility showing the slowest decline. Clustering of leprosy within households is partially caused by an increased transmission within households independent of the leprosy susceptibility mechanism. Even a large and detailed data set on contacts of leprosy patients could not unequivocally reveal the mechanism most likely responsible for heterogeneity in leprosy susceptibility.
Commonly-occurring disease etiology may involve complex combinations of genes and exposures resulting in etiologic heterogeneity. We present a computational algorithm that employs clique-finding for heterogeneity and multidimensionality in biomedical and epidemiological research (the “CHAMBER” algorithm).
This algorithm uses graph-building to (1) identify genetic variants that influence disease risk and (2) predict individuals at risk for disease based on inherited genotype. We use a set-covering algorithm to identify optimal cliques and a Boolean function that identifies etiologically heterogeneous groups of individuals. We evaluated this approach using simulated case-control genotype-disease associations involving two- and four-gene patterns. The CHAMBER algorithm correctly identified these simulated etiologies. We also used two population-based case-control studies of breast and endometrial cancer in African American and Caucasian women considering data on genotypes involved in steroid hormone metabolism. We identified novel patterns in both cancer sites that involved genes that sulfate or glucuronidate estrogens or catecholestrogens. These associations were consistent with the hypothesized biological functions of these genes. We also identified cliques representing the joint effect of multiple candidate genes in all groups, suggesting the existence of biologically plausible combinations of hormone metabolism genes in both breast and endometrial cancer in both races.
The CHAMBER algorithm may have utility in exploring the multifactorial etiology and etiologic heterogeneity in complex disease.
The dynamic nature of contact patterns creates diverse temporal structures. In particular, empirical studies have shown that contact patterns follow heterogeneous inter-event time intervals, meaning that periods of high activity are followed by long periods of inactivity. To investigate the impact of these heterogeneities in the spread of infection from a theoretical perspective, we propose a stochastic model to generate temporal networks where vertices make instantaneous contacts following heterogeneous inter-event intervals, and may leave and enter the system. We study how these properties affect the prevalence of an infection and estimate , the number of secondary infections of an infectious individual in a completely susceptible population, by modeling simulated infections (SI and SIR) that co-evolve with the network structure. We find that heterogeneous contact patterns cause earlier and larger epidemics in the SIR model in comparison to homogeneous scenarios for a vast range of parameter values, while smaller epidemics may happen in some combinations of parameters. In the case of SI and heterogeneous patterns, the epidemics develop faster in the earlier stages followed by a slowdown in the asymptotic limit. For increasing vertex turnover rates, heterogeneous patterns generally cause higher prevalence in comparison to homogeneous scenarios with the same average inter-event interval. We find that is generally higher for heterogeneous patterns, except for sufficiently large infection duration and transmission probability.
Networks of sexual contacts and of spatial proximity are of interest for the understanding of epidemics because they define potential pathways by which sexual and airborne infections spread. These networks are not static but vary, with both vertices and links appearing and disappearing at different times. One of the temporal properties observed across systems is that the time lapse between two contacts is irregular, which means that high activity is followed by long intervals of idleness. In this article, by using a theoretical model of a dynamic network co-evolving with a simulated infection, we show that such heterogeneity leads to earlier epidemic outbreaks and increased prevalence of infections for a range of parameters, in comparison to scenarios of regular activity, which is the current modeling paradigm in mathematical epidemiology. We also include a turnover rate to model individuals entering and leaving the system, and we show that if turnover is high, the relative difference in the prevalence of heterogeneous and homogeneous contact patterns increases due to the continuous influx of susceptible individuals. These heterogeneities also increase the expected number of secondary infections produced by a single infected vertex in a completely susceptible population.
We describe the study design, procedures, and development of the risk counseling protocol used in a randomized controlled trial to evaluate the impact of genetic testing for diabetes mellitus (DM) on psychological, health behavior, and clinical outcomes.
Eligible patients are aged 21 to 65 years with body mass index (BMI) ≥27 kg/m2 and no prior diagnosis of DM. At baseline, conventional DM risk factors are assessed, and blood is drawn for possible genetic testing. Participants are randomized to receive conventional risk counseling for DM with eye disease counseling or with genetic test results. The counseling protocol was pilot tested to identify an acceptable graphical format for conveying risk estimates and match the length of the eye disease to genetic counseling. Risk estimates are presented with a vertical bar graph denoting risk level with colors and descriptors. After receiving either genetic counseling regarding risk for DM or control counseling on eye disease, brief lifestyle counseling for prevention of DM is provided to all participants.
A standardized risk counseling protocol is being used in a randomized trial of 600 participants. Results of this trial will inform policy about whether risk counseling should include genetic counseling.
ClinicalTrials.gov Identifier NCT01060540
Genetic testing; Type II diabetes; Weight loss
Genetic counseling is now routinely offered to individuals at high risk of carrying a BRCA1 or BRCA2 mutation. Risk prediction provided by the counselor requires reliable estimates of the mutation penetrance. Such penetrance has been investigated by studies worldwide. The reported estimates vary. To facilitate clinical management and counseling of the at-risk population, we address this issue through a meta-analysis.
We conducted a literature search on PubMed and selected studies that had nonoverlapping patient data, contained genotyping information, used statistical methods that account for the ascertainment, and reported risks in a useable format. We subsequently combined the published estimates using the DerSimonian and Laird random effects modeling approach.
Ten studies were eligible under the selection criteria. Between-study heterogeneity was observed. Study population, mutation type, design, and estimation methods did not seem to be systematic sources of heterogeneity. Meta-analytic mean cumulative cancer risks for mutation carriers at age 70 years were as follows: breast cancer risk of 57% (95% CI, 47% to 66%) for BRCA1 and 49% (95% CI, 40% to 57%) for BRCA2 mutation carriers; and ovarian cancer risk of 40% (95% CI, 35% to 46%) for BRCA1 and 18% (95% CI, 13% to 23%) for BRCA2 mutation carriers. We also report the prospective risks of developing cancer for currently asymptomatic carriers.
This article provides a set of risk estimates for BRCA1 and BRCA2 mutation carriers that can be used by counselors and clinicians who are interested in advising patients based on a comprehensive set of studies rather than one specific study.
Keratoconus (KC; Mendelian Inheritance in Man (OMIM) 14830) is a bilateral, progressive corneal defect affecting all ethnic groups around the world. It is the leading cause of corneal transplantation. The age of onset is at puberty, and the disorder is progressive until the 3rd–4th decade of life when it usually arrests. It is one of the major ocular problems with significant social and economic impacts as the disease affects young generation. Although genetic and environmental factors are associated with KC, but the precise etiology is still elusive. Results from complex segregation analysis suggests that genetic abnormalities may play an essential role in the susceptibility to KC. Due to genetic heterogeneity, a recent study revealed 17 different genomic loci identified in KC families by linkage mapping in various populations. The focus of this review is to provide a concise update on the current knowledge of the genetic basis of KC and genomic approaches to understand the disease pathogenesis.
Disease pathogenesis; genetic heterogeneity; genetics and genomics; genome-wide association study; genomic loci; keratoconus; linkage mapping; molecular mechanisms; whole exome-genome sequencing
Current practice for patients with breast cancer referred for genetic counseling, includes face-to-face consultations with a genetic counselor prior to and following DNA-testing. This is based on guidelines regarding Huntington’s disease in anticipation of high psychosocial impact of DNA-testing for mutations in BRCA1/2 genes. The initial consultation covers generic information regarding hereditary breast cancer and the (im)possibilities of DNA-testing, prior to such testing. Patients with breast cancer may see this information as irrelevant or unnecessary because individual genetic advice depends on DNA-test results. Also, verbal information is not always remembered well by patients. A different format for this information prior to DNA-testing is possible: replacing initial face-to-face genetic counseling (DNA-intake procedure) by telephone, written and digital information sent to patients’ homes (DNA-direct procedure).
In this intervention study, 150 patients with breast cancer referred to the department of Clinical Genetics of the Radboud University Nijmegen Medical Centre are given the choice between two procedures, DNA-direct (intervention group) or DNA-intake (usual care, control group). During a triage telephone call, patients are excluded if they have problems with Dutch text, family communication, or of psychological or psychiatric nature. Primary outcome measures are satisfaction and psychological distress. Secondary outcome measures are determinants for the participant’s choice of procedure, waiting and processing times, and family characteristics. Data are collected by self-report questionnaires at baseline and following completion of genetic counseling. A minority of participants will receive an invitation for a 30 min semi-structured telephone interview, e.g. confirmed carriers of a BRCA1/2 mutation, and those who report problems with the procedure.
This study compares current practice of an intake consultation (DNA-intake) to a home informational package of telephone, written and digital information (DNA-direct) prior to DNA-testing in patients with breast cancer. The aim is to determine whether DNA-direct is an acceptable procedure for BRCA1/2 testing, in order to provide customized care to patients with breast cancer, cutting down on the period of uncertainty during this diagnostic process.
The study is registered at the Dutch Trial Registry http://www.trialregister.nl (NTR3018).
Hereditary; Breast cancer; BRCA; Genetic; Counseling; DNA
Genomewide association studies have become the primary tool for discovering the genetic basis of complex human diseases. Such studies are susceptible to the confounding effects of population stratification, in that the combination of allele-frequency heterogeneity with disease-risk heterogeneity among different ancestral subpopulations can induce spurious associations between genetic variants and disease. This article provides a statistically rigorous and computationally feasible solution to this challenging problem of unmeasured confounders. We show that the odds ratio of disease with a genetic variant is identifiable if and only if the genotype is independent of the unknown population substructure conditional on a set of observed ancestry-informative markers in the disease-free population. Under this condition, the odds ratio of interest can be estimated by fitting a semiparametric logistic regression model with an arbitrary function of a propensity score relating the genotype probability to ancestry-informative markers. Approximating the unknown function of the propensity score by B-splines, we derive a consistent and asymptotically normal estimator for the odds ratio of interest with a consistent variance estimator. Simulation studies demonstrate that the proposed inference procedures perform well in realistic settings. An application to the well-known Wellcome Trust Case-Control Study is presented. Supplemental materials are available online.
B-spline; Case-control study; Principal components; Propensity score; Semiparametric logistic regression; Single nucleotide polymorphism
The responsible genes have not yet been identified for many genetically mapped disease loci. Physically interacting proteins tend to be involved in the same cellular process, and mutations in their genes may lead to similar disease phenotypes.
To investigate whether protein–protein interactions can predict genes for genetically heterogeneous diseases.
72 940 protein–protein interactions between 10 894 human proteins were used to search 432 loci for candidate disease genes representing 383 genetically heterogeneous hereditary diseases. For each disease, the protein interaction partners of its known causative genes were compared with the disease associated loci lacking identified causative genes. Interaction partners located within such loci were considered candidate disease gene predictions. Prediction accuracy was tested using a benchmark set of known disease genes.
Almost 300 candidate disease gene predictions were made. Some of these have since been confirmed. On average, 10% or more are expected to be genuine disease genes, representing a 10‐fold enrichment compared with positional information only. Examples of interesting candidates are AKAP6 for arrythmogenic right ventricular dysplasia 3 and SYN3 for familial partial epilepsy with variable foci.
Exploiting protein–protein interactions can greatly increase the likelihood of finding positional candidate disease genes. When applied on a large scale they can lead to novel candidate gene predictions.
disease gene; candidate gene; disease gene prediction; protein–protein interactions; bioinformatics
Genome-wide association studies (GWAS) are routinely conducted for both quantitative and binary (disease) traits. We present two analytical tools for use in the experimental design of GWAS. Firstly, we present power calculations quantifying power in a unified framework for a range of scenarios. In this context we consider the utility of quantitative scores (e.g. endophenotypes) that may be available on cases only or both cases and controls. Secondly, we consider, the accuracy of prediction of genetic risk from genome-wide SNPs and derive an expression for genomic prediction accuracy using a liability threshold model for disease traits in a case-control design. The expected values based on our derived equations for both power and prediction accuracy agree well with observed estimates from simulations.
Resequencing is an emerging tool for identification of rare disease-associated mutations. Rare mutations are difficult to tag with SNP genotyping, as genotyping studies are designed to detect common variants. However, studies have shown that genetic heterogeneity is a probable scenario for common diseases, in which multiple rare mutations together explain a large proportion of the genetic basis for the disease. Thus, we propose a weighted-sum method to jointly analyse a group of mutations in order to test for groupwise association with disease status. For example, such a group of mutations may result from resequencing a gene. We compare the proposed weighted-sum method to alternative methods and show that it is powerful for identifying disease-associated genes, both on simulated and Encode data. Using the weighted-sum method, a resequencing study can identify a disease-associated gene with an overall population attributable risk (PAR) of 2%, even when each individual mutation has much lower PAR, using 1,000 to 7,000 affected and unaffected individuals, depending on the underlying genetic model. This study thus demonstrates that resequencing studies can identify important genetic associations, provided that specialised analysis methods, such as the weighted-sum method, are used.
Resequencing is an emerging tool for the identification of rare disease-associated mutations. Recent studies have shown that groups of multiple rare mutations together can explain a large proportion of the genetic basis for some diseases. Therefore, we propose a new statistical method for analysing a group of mutations in order to test for groupwise association with disease status. We compare the proposed weighted-sum method to alternative methods and show that it is powerful for identifying disease-associated groups of mutations, both on computer-simulated and real data. By using computer simulations, we further show that resequencing a few thousand individuals is sufficient to perform a genome-wide study of all human genes, if the proposed method is used. This study thus demonstrates that resequencing studies can identify important genetic associations, provided that specialised analysis methods, such as the proposed weighted-sum method, are used.
Eleven genetic loci have reached genome-wide significance in a recent meta-analysis of genome-wide association studies in Parkinson disease (PD) based on populations of Caucasian descent. The extent to which these genetic effects are consistent across different populations is unknown.
Investigators from the Genetic Epidemiology of Parkinson's Disease Consortium were invited to participate in the study. A total of 11 SNPs were genotyped in 8,750 cases and 8,955 controls. Fixed as well as random effects models were used to provide the summary risk estimates for these variants. We evaluated between-study heterogeneity and heterogeneity between populations of different ancestry.
In the overall analysis, single nucleotide polymorphisms (SNPs) in 9 loci showed significant associations with protective per-allele odds ratios of 0.78–0.87 (LAMP3, BST1, and MAPT) and susceptibility per-allele odds ratios of 1.14–1.43 (STK39, GAK, SNCA, LRRK2, SYT11, and HIP1R). For 5 of the 9 replicated SNPs there was nominally significant between-site heterogeneity in the effect sizes (I2 estimates ranged from 39% to 48%). Subgroup analysis by ethnicity showed significantly stronger effects for the BST1 (rs11724635) in Asian vs Caucasian populations and similar effects for SNCA, LRRK2, LAMP3, HIP1R, and STK39 in Asian and Caucasian populations, while MAPT rs2942168 and SYT11 rs34372695 were monomorphic in the Asian population, highlighting the role of population-specific heterogeneity in PD.
Our study allows insight to understand the distribution of newly identified genetic factors contributing to PD and shows that large-scale evaluation in diverse populations is important to understand the role of population-specific heterogeneity. Neurology® 2012;79:659–667
The prediction of the genetic disease risk of an individual is a powerful public health tool. While predicting risk has been successful in diseases which follow simple Mendelian inheritance, it has proven challenging in complex diseases for which a large number of loci contribute to the genetic variance. The large numbers of single nucleotide polymorphisms now available provide new opportunities for predicting genetic risk of complex diseases with high accuracy.
We have derived simple deterministic formulae to predict the accuracy of predicted genetic risk from population or case control studies using a genome-wide approach and assuming a dichotomous disease phenotype with an underlying continuous liability. We show that the prediction equations are special cases of the more general problem of predicting the accuracy of estimates of genetic values of a continuous phenotype. Our predictive equations are responsive to all parameters that affect accuracy and they are independent of allele frequency and effect distributions. Deterministic prediction errors when tested by simulation were generally small. The common link among the expressions for accuracy is that they are best summarized as the product of the ratio of number of phenotypic records per number of risk loci and the observed heritability.
This study advances the understanding of the relative power of case control and population studies of disease. The predictions represent an upper bound of accuracy which may be achievable with improved effect estimation methods. The formulae derived will help researchers determine an appropriate sample size to attain a certain accuracy when predicting genetic risk.
Numerous founder mutations have been reported in BRCA1 and BRCA2. For genetic screening of a population with a founder mutation, testing can be targeted to the mutation, allowing for a more rapid and less expensive test. In addition, more precise estimates of the prior probability of carrying a mutation and of the likelihood of a mutation carrier developing cancer should be possible. For a given founder mutation a large number of carriers are available, so that focused scientific studies of penetrance, expression, and genetic and environmental modifiers of risk can be performed. Finally, founder populations may be a powerful resource to localize additional breast cancer susceptibility loci, because of the reduction in locus heterogeneity.
BRCA1; BRCA2; breast cancer genes; founder mutations; genetic epidemiology