Infection with Epstein-Barr virus (EBV) is highly prevalent worldwide, and it has been associated with infectious mononucleosis and severe diseases including Burkitt lymphoma, Hodgkin lymphoma, nasopharyngeal lymphoma, and lymphoproliferative disorders. Although EBV has been the focus of extensive research, much still remains unknown concerning what makes some individuals more sensitive to infection and to adverse outcomes as a result of infection. Here we use an integrative genomics approach in order to localize genetic factors influencing levels of Epstein Barr virus (EBV) nuclear antigen-1 (EBNA-1) IgG antibodies, as a measure of history of infection with this pathogen, in large Mexican American families. Genome-wide evidence of both significant linkage and association was obtained on chromosome 6 in the human leukocyte antigen (HLA) region and replicated in an independent Mexican American sample of large families (minimum p-value in combined analysis of both datasets is 1.4×10−15 for SNPs rs477515 and rs2516049). Conditional association analyses indicate the presence of at least two separate loci within MHC class II, and along with lymphocyte expression data suggest genes HLA-DRB1 and HLA-DQB1 as the best candidates. The association signals are specific to EBV and are not found with IgG antibodies to 12 other pathogens examined, and therefore do not simply reveal a general HLA effect. We investigated whether SNPs significantly associated with diseases in which EBV is known or suspected to play a role (namely nasopharyngeal lymphoma, Hodgkin lymphoma, systemic lupus erythematosus, and multiple sclerosis) also show evidence of associated with EBNA-1 antibody levels, finding an overlap only for the HLA locus, but none elsewhere in the genome. The significance of this work is that a major locus related to EBV infection has been identified, which may ultimately reveal the underlying mechanisms by which the immune system regulates infection with this pathogen.
Many factors influence individual differences in susceptibility to infectious disease, including genetic factors of the host. Here we use several genome-wide investigative tools (linkage, association, joint linkage and association, and the analysis of gene expression data) to search for host genetic factors influencing Epstein-Barr virus (EBV) infection. EBV is a human herpes virus that infects up to 90% of adults worldwide, infection with which has been associated with severe complications including malignancies and autoimmune disorders. In a sample of >1,300 Mexican American family members, we found significant evidence of association of anti–EBV antibody levels with loci on chromosome 6 in the human leukocyte antigen region, which contains genes related to immune function. The top two independent loci in this region were HLA-DRB1 and HLA-DQB1, both of which are involved in the presentation of foreign antigens to T cells. This finding was specific to EBV and not to 12 other pathogens we examined. We also report an overlap of genetic factors influencing both EBV antibody level and EBV–related cancers and autoimmune disorders. This work demonstrates the presence of EBV susceptibility loci and provides impetus for further investigation to better understand the underlying mechanisms related to differences in disease progression among individuals infected with this pathogen.
Despite overwhelming evidence that major depression is highly heritable, recent studies have localized only a single depression-related locus reaching genome-wide significance and have yet to identify a causal gene. Focusing on family-based studies of quantitative intermediate phenotypes or endophenotypes, in tandem with studies of unrelated individuals using categorical diagnoses, should improve the likelihood of identifying major depression genes. However, there is currently no empirically-derived statistically rigorous method for selecting optimal endophentypes for mental illnesses. Here we describe the Endophenotype Ranking Value (ERV), a new objective index of the genetic utility of endophenotypes for any heritable illness.
Applying ERV analysis to a high-dimensional set of over 11,000 traits drawn from behavioral/neurocognitive, neuroanatomic, and transcriptomic phenotypic domains, we identified a set of objective endophenotypes for recurrent major depression in a sample of Mexican American individiauls (n=1122) from large randomly-selected extended pedigrees.
Top-ranked endophenotypes included the Beck Depression Inventory, bilateral ventral diencephalon volume and expression levels of the RNF123 transcript. To illustrate the utility of endophentypes in this context, each of these traits were utlized along with disease status in bivariate linkage analysis. A genome-wide significant quantitative trait locus was localized on chromsome 4p15 (LOD=3.5) exhibiting pleiotropic effects on both the endophenotype (lymphocyte-derived expression levels of the RNF123 gene) and disease risk.
The wider use of quantitative endophentpyes, combined with unbiased methods for selecting among these measures, should spur new insights into the biological mechanisms that influence mental illnesses like major depression.
major depression; recurrent major depression; endophenotype; endophenotype ranking; linkage; family studies
To explore the genetic components of susceptibility to early childhood diarrhea (ECD), we used a quantitative genetic approach to estimate the heritability of ECD among children from two Brazilian favelas. Shared environment was used to model common exposure to environmental factors. Genetic relatedness was determined from pedigree information collected by screening household participants (n = 3,267) from two geographically related favelas located in Fortaleza, Brazil. There were 277 children within these pedigrees for whom diarrheal episodes in the first two years of life were recorded. Data on environmental exposure and pedigree relationship were combined to quantitatively partition phenotypic variance in ECD into environmental and genetic components by using a variance components approach as implemented in Sequential Oligogenic Linkage Analysis Routines program. Heritability accounted for 54% of variance in ECD and proximity of residence effect accounted for 21% (P < 0.0001). These findings suggest a substantial genetic component to ECD susceptibility and the potential importance of future genetics studies.
Although genetic influences on bipolar disorder are well established, localization of genes that predispose to the illness has proven difficult. Given that genes predisposing to bipolar disorder may be transmitted without expression of the categorical clinical phenotype, one strategy for identifying risk genes is the use of quantitative endophenotypes.
The goal of the current study is to adjudicate neurocognitive endophenotypes for bipolar disorder.
Design, Setting, and Participants
709 Latino individuals from the central valley of Costa Rica, Mexico City, Mexico, or San Antonio, Texas participated in the study. 660 of these persons were members of extended pedigrees with at least two siblings diagnosed with bipolar disorder (n=230). The remaining subjects were community controls drawn from each site and without personal or family history of bipolar disorder or schizophrenia. All subjects received psychodiagnostic interviews and comprehensive neurocognitive evaluations. Neurocognitive measures found to be heritable were entered into analyses designed to determine which tests are impaired in affected individuals, sensitive to genetic liability for the illness and genetically correlated with affection status.
Main Outcome Measures
The main outcome measure was neurocognitive test performance.
Two of the 21 neurocognitive variables were not significantly heritable and were excluded from subsequent analyses. Patients with bipolar disorder were impaired on 6 of these cognitive measures compared to non-related healthy subjects. Non-bipolar first-degree relatives were impaired on five of these and three tests were genetically correlated with affection status: digit symbol coding, object delayed response, and immediate facial memory.
This large-scale extended pedigree study of cognitive functioning in bipolar disorder identified measures of processing speed, working memory and declarative (facial) memory as candidate endophenotypes for bipolar disorder.
bipolar disorder; endophenotype; genetics; family studies; neurocognitive; neuropsychological
Host genetic factors exert significant influences on differential susceptibility to many infectious diseases. In addition, population structure of both host and parasite may influence disease distribution patterns. In this study, we assess the effects of population structure on infectious disease in two populations in which host genetic factors influencing susceptibility to parasitic disease have been extensively studied. The first population is the Jirel population of eastern Nepal that has been the subject of research on the determinants of differential susceptibility to soil-transmitted helminth infections. The second group is a Brazilian population residing in an area endemic for Trypanosoma cruzi infection that has been assessed for genetic influences on differential disease progression in Chagas disease. For measures of Ascaris worm burden, within-population host genetic effects are generally more important than host population structure factors in determining patterns of infectious disease. No significant influences of population structure on measures associated with progression of cardiac disease in individuals who were seropositive for T. cruzi infection were found.
population structure; genetics of infectious disease susceptibility; intestinal worms; Chagas disease
Elucidating the genetic architecture of preeclampsia is a major goal in obstetric medicine. We have performed a genome-wide association study (GWAS) for preeclampsia in unrelated Australian individuals of Caucasian ancestry using the Illumina OmniExpress-12 BeadChip to successfully genotype 648,175 SNPs in 538 preeclampsia cases and 540 normal pregnancy controls. Two SNP associations (rs7579169, p = 3.58×10−7, OR = 1.57; rs12711941, p = 4.26×10−7, OR = 1.56) satisfied our genome-wide significance threshold (modified Bonferroni p<5.11×10−7). These SNPs reside in an intergenic region less than 15 kb downstream from the 3′ terminus of the Inhibin, beta B (INHBB) gene on 2q14.2. They are in linkage disequilibrium (LD) with each other (r2 = 0.92), but not (r2<0.80) with any other genotyped SNP ±250 kb. DNA re-sequencing in and around the INHBB structural gene identified an additional 25 variants. Of the 21 variants that we successfully genotyped back in the case-control cohort the most significant association observed was for a third intergenic SNP (rs7576192, p = 1.48×10−7, OR = 1.59) in strong LD with the two significant GWAS SNPs (r2>0.92). We attempted to provide evidence of a putative regulatory role for these SNPs using bioinformatic analyses and found that they all reside within regions of low sequence conservation and/or low complexity, suggesting functional importance is low. We also explored the mRNA expression in decidua of genes ±500 kb of INHBB and found a nominally significant correlation between a transcript encoded by the EPB41L5 gene, ∼250 kb centromeric to INHBB, and preeclampsia (p = 0.03). We were unable to replicate the associations shown by the significant GWAS SNPs in case-control cohorts from Norway and Finland, leading us to conclude that it is more likely that these SNPs are in LD with as yet unidentified causal variant(s).
A combined genome-wide association and linkage study was used to identify loci causing variation in CF lung disease severity. A significant association (P=3. 34 × 10-8) near EHF and APIP (chr11p13) was identified in F508del homozygotes (n=1,978). The association replicated in F508del homozygotes (P=0.006) from a separate family-based study (n=557), with P=1.49 × 10-9 for the three-study joint meta-analysis. Linkage analysis of 486 sibling pairs from the family-based study identified a significant QTL on chromosome 20q13.2 (LOD=5.03). Our findings provide insight into the causes of variation in lung disease severity in CF and suggest new therapeutic targets for this life-limiting disorder.
The phenomenon of synthetic association raises the possibility that common variant genetic markers may be coupled with functional rare variants sufficiently often to allow the rare variants to be tagged by the common ones. Using human exome sequence data from the 1000 Genomes Project, two investigative teams in Group 12 of Genetic Analysis Workshop 17 found that stochastic coupling between rare and common variants does occur, although perhaps not sufficiently often that we can expect common variant signals to reflect synthetic association; other teams considered methods for detecting association using both rare and common variants. Common themes were that synthetic association is more apparent in population strata (ancestral or familial) and that careful selection of the unit of analysis (gene, gene network, or other genomic subset) is likely to be crucial to the discovery of rare variants that contribute to risk of disease.
synthetic association; rare variants; association; identity by state
Genetic Analysis Workshop 17 (GAW17) provided a platform for evaluating existing statistical genetic methods and for developing novel methods to analyze rare variants that modulate complex traits. In this article, we present an overview of the 1000 Genomes Project exome data and simulated phenotype data that were distributed to GAW17 participants for analyses, the different issues addressed by the participants, and the process of preparation of manuscripts resulting from the discussions during the workshop.
The data set simulated for Genetic Analysis Workshop 17 was designed to mimic a subset of data that might be produced in a full exome screen for a complex disorder and related risk factors in order to permit workshop participants to investigate issues of study design and statistical genetic analysis. Real sequence data from the 1000 Genomes Project formed the basis for simulating a common disease trait with a prevalence of 30% and three related quantitative risk factors in a sample of 697 unrelated individuals and a second sample of 697 individuals in large, extended pedigrees. Called genotypes for 24,487 autosomal markers assigned to 3,205 genes and simulated affection status, quantitative traits, age, sex, pedigree relationships, and cigarette smoking were provided to workshop participants. The simulating model included both common and rare variants with minor allele frequencies ranging from 0.07% to 25.8% and a wide range of effect sizes for these variants. Genotype-smoking interaction effects were included for variants in one gene. Functional variants were concentrated in genes selected from specific biological pathways and were selected on the basis of the predicted deleteriousness of the coding change. For each sample, unrelated individuals and family, 200 replicates of the phenotypes were simulated.
Joint analyses of correlated phenotypes in genetic epidemiology studies are common. However, these analyses primarily focus on genetic correlation between traits and do not take into account environmental correlation. We describe a method that optimizes the genetic signal by accounting for stochastic environmental noise through joint analysis of a discrete trait and a correlated quantitative marker. We conducted bivariate analyses where heritability and the environmental correlation between the discrete and quantitative traits were calculated using Genetic Analysis Workshop 17 (GAW17) family data. The resulting inverse value of the environmental correlation between these traits was then used to determine a new β coefficient for each quantitative trait and was constrained in a univariate model. We conducted genetic association tests on 7,087 nonsynonymous SNPs in three GAW17 family replicates for Affected status with the β coefficient fixed for three quantitative phenotypes and compared these to an association model where the β coefficient was allowed to vary. Bivariate environmental correlations were 0.64 (± 0.09) for Q1, 0.798 (± 0.076) for Q2, and −0.169 (± 0.18) for Q4. Heritability of Affected status improved in each univariate model where a constrained β coefficient was used to account for stochastic environmental effects. No genome-wide significant associations were identified for either method but we demonstrated that constraining β for covariates slightly improved the genetic signal for Affected status. This environmental regression approach allows for increased heritability when the β coefficient for a highly correlated quantitative covariate is constrained and increases the genetic signal for the discrete trait.
The synthetic association hypothesis proposes that common genetic variants detectable in genome-wide association studies may reflect the net phenotypic effect of multiple rare polymorphisms distributed broadly within the focal gene rather than, as often assumed, the effect of common functional variants in high linkage disequilibrium with the focal marker. In a recent study, Dickson and colleagues demonstrated synthetic association in simulations and in two well-characterized, highly polymorphic human disease genes. The converse of this hypothesis is that rare variant genotypes must be correlated with common variant genotypes often enough to make the phenomenon of synthetic association possible. Here we used the exome genotype data provided for Genetic Analysis Workshop 17 to ask how often, how well, and under what conditions rare variant genotypes predict the genotypes of common variants within the same gene. We found nominal evidence of correlation between rare and common variants in 21-30% of cases examined for unrelated individuals; this rate increased to 38-44% for related individuals, underscoring the segregation that underlies synthetic association.
Heart rate (HR) has been identified as a risk factor for cardiovascular disease (CVD), yet little is known regarding genetic factors influencing this phenotype. Previous research in American Indians (AIs) from the Strong Heart Family Study (SHFS) identified a significant quantitative trait locus (QTL) for HR on chromosome 9p21. Genetic association on HR was conducted in the SHFS. HR was measured from electrocardiogram (ECG) and echocardiograph (Echo) Doppler recordings. We examined 2248 single-nucleotide polymorphisms (SNPs) on chromosome 9p21 for association using a gene-centric statistical test. We replicated the aforementioned QTL [logarithm of odds (LOD) = 4.83; genome-wide P= 0.0003] on chromosome 9p21 in one SHFS population using joint linkage of ECG and Echo HR. After correcting for effective number of SNPs using a gene-centric test, six SNPs (rs7875153, rs7848524, rs4446809, rs10964759, rs1125488 and rs7853123) remained significant. We applied a novel bivariate association method, which was a joint test of association of a single locus to two traits using a standard additive genetic model. The SNP, rs7875153, provided the strongest evidence for association (P = 7.14 × 10−6). This SNP (rs7875153) is rare (minor allele frequency = 0.02) in AIs and is located within intron 9 of the gene KIAA1797. To support this association, we applied lymphocyte RNA expression data from the San Antonio Family Heart Study, a longitudinal study of CVD in Mexican Americans. Expression levels of KIAA1797 were significantly associated (P = 0.012) with HR. These findings in independent populations support that KIAA1797 genetic variation may be associated with HR but elucidation of a functional relationship requires additional study.
Objectives: The thickness of the brain’s cortical gray matter (GM) and the fractional anisotropy (FA) of the cerebral white matter (WM) each follow an inverted U-shape trajectory with age. The two measures are positively correlated and may be modulated by common biological mechanisms. We employed four types of genetic analyses to localize individual genes acting pleiotropically upon these phenotypes. Methods: Whole-brain and regional GM thickness and FA values were measured from high-resolution anatomical and diffusion tensor MR images collected from 712, Mexican American participants (438 females, age = 47.9 ± 13.2 years) recruited from 73 (9.7 ± 9.3 individuals/family) large families. The significance of the correlation between two traits was estimated using a bivariate genetic correlation analysis. Localization of chromosomal regions that jointly influenced both traits was performed using whole-genome quantitative trait loci (QTL) analysis. Gene localization was performed using SNP genotyping on Illumina 1M chip and correlation with leukocyte-based gene-expression analyses. The gene-expressions were measured using the Illumina BeadChip. These data were available for 371 subjects. Results: Significant genetic correlation was observed among GM thickness and FA values. Significant logarithm of odds (LOD ≥ 3.0) QTLs were localized within chromosome 15q22–23. More detailed localization reported no significant association (p < 5·10−5) for 1565 SNPs located within the QTLs. Post hoc analysis indicated that 40% of the potentially significant (p ≤ 10−3) SNPs were localized to the related orphan receptor alpha (RORA) and NARG2 genes. A potentially significant association was observed for the rs2456930 polymorphism reported as a significant GWAS finding in Alzheimer’s disease neuroimaging initiative subjects. The expression levels for RORA and ADAM10 genes were significantly (p < 0.05) correlated with both FA and GM thickness. NARG2 expressions were significantly correlated with GM thickness (p < 0.05) but failed to show a significant correlation (p = 0.09) with FA. Discussion: This study identified a novel, significant QTL at 15q22–23. SNP correlation with gene-expression analyses indicated that RORA, NARG2, and ADAM10 jointly influence GM thickness and WM–FA values.
imaging genetics; cortical thickness; WM integrity; genetic correlation; GWAS; QTL; RORA; ADAM10
The complex etiology of common diseases like cardiovascular disease, diabetes, hypertension, and rheumatoid arthritis has led investigators to focus on the genetics of correlated phenotypes and risk factors. Joint analysis of multiple disease-related phenotypes may reveal genes of pleiotropic effect and increase analytical power, but at the cost of increased analytical and computational complexity. All three data sets provided for analysis at the Genetic Analysis Workshop 16 offered multiple quantitative measures of phenotypes related to underlying disease processes as well as discrete measures of affection status. Participants in Group 6 addressed the challenges and possibilities of association analysis of these data sets on multiple levels, including phenotype definition and data reduction, multivariate approaches to gene discovery, analysis of causality and data structure, and development of predictive models. These approaches included combinations of continuous and discrete phenotypes, use of repeated measures in longitudinal data, and models that included multiple phenotypic measures and multiple single-nucleotide polymorphism variants. Most research teams regarded the use of multiple related phenotypes as a tool for increasing analytical power, as well as for clarifying the underlying biology of complex diseases.
multivariate analyses; quantitative traits; longitudinal data; instrumental variables; association analysis; genetic risk scores; data reduction; correlation
This investigation offers insights into system-wide pathological processes induced in response to cigarette smoke exposure by determining its influences at the gene expression level.
We obtained genome-wide quantitative transcriptional profiles from 1,240 individuals from the San Antonio Family Heart Study, including 297 current smokers. Using lymphocyte samples, we identified 20,413 transcripts with significantly detectable expression levels, including both known and predicted genes. Correlation between smoking and gene expression levels was determined using a regression model that allows for residual genetic effects.
With a conservative false-discovery rate of 5% we identified 323 unique genes (342 transcripts) whose expression levels were significantly correlated with smoking behavior. These genes showed significant over-representation within a range of functional categories that correspond well with known smoking-related pathologies, including immune response, cell death, cancer, natural killer cell signaling and xenobiotic metabolism.
Our results indicate that not only individual genes but entire networks of gene interaction are influenced by cigarette smoking. This is the largest in vivo transcriptomic epidemiological study of smoking to date and reveals the significant and comprehensive influence of cigarette smoke, as an environmental variable, on the expression of genes. The central importance of this manuscript is to provide a summary of the relationships between gene expression and smoking in this exceptionally large cross-sectional data set.
Many phenotypes may be influenced by the prenatal environment of the mother and/or maternal care, and these maternal effects may have a heritable component. We have implemented in the computer program SOLAR a variance components-based method for detecting indirect effects of maternal genotype on offspring phenotype. Of six phenotypes measured in three generations of the Framingham Heart Study, height showed the strongest evidence (P = 0.02) of maternal effect. We conducted a genome-wide association analysis for height, testing both the direct effect of the focal individual's genotype and the indirect effect of the maternal genotype. Offspring height showed suggestive evidence of association with maternal genotype for two single-nucleotide polymorphisms in the trafficking protein particle complex 9 gene TRAPPC9 (NIBP), which plays a role in neuronal NF-κB signalling. This work establishes a methodological framework for identifying genetic variants that may influence the contribution of the maternal environment to offspring phenotypes.
Circulating soluble intercellular adhesion molecule-1 (sICAM-1) is a biochemical marker of inflammation. We performed variance-components-based quantitative genetic analyses in SOLAR of sICAM-1 in 1170 individuals from Mexican American families in the San Antonio Family Heart Study. The trait is heritable (h2 = 0.50±0.06, P<10-6). Multipoint linkage analysis using a ∼10-cM microsatellite map revealed a region on Chromosome 19p near marker D19S586 showing strong evidence of linkage for sICAM-1 (empirically adjusted univariate-equivalent LOD = 4.95), coincident with the structural gene ICAM1. This region has been identified previously as a QTL for inflammatory, autoimmune, and metabolic syndrome traits. There is significant evidence (P=0.0023) of locus heterogeneity for sICAM-1 in this sample: a subset of pedigrees contributes most of the linkage signal for sICAM-1 on Chromosome 19, suggesting a logical focus for future genetic dissection of the trait.
ICAM-1; inflammation; genetic heterogeneity; genome scan; quantitative trait locus; Mexican Americans
Circadian (∼24 hr) rhythms are generated by the central pacemaker localized to the suprachiasmatic nucleus (SCN) of the hypothalamus. Although the basis for intrinsic rhythmicity is generally understood to rely on transcription factors encoded by “clock genes”, less is known about the daily regulation of SCN neuronal activity patterns that communicate a circadian time signal to downstream behaviors and physiological systems. Action potentials in the SCN are necessary for the circadian timing of behavior, and individual SCN neurons modulate their spontaneous firing rate (SFR) over the daily cycle, suggesting that the circadian patterning of neuronal activity is necessary for normal behavioral rhythm expression. The BK K+ channel plays an important role in suppressing spontaneous firing at night in SCN neurons. Deletion of the Kcnma1 gene, encoding the BK channel, causes degradation of circadian behavioral and physiological rhythms.
To test the hypothesis that loss of robust behavioral rhythmicity in Kcnma1−/− mice is due to the disruption of SFR rhythms in the SCN, we used multi-electrode arrays to record extracellular action potentials from acute wild-type (WT) and Kcnma1−/− slices. Patterns of activity in the SCN were tracked simultaneously for up to 3 days, and the phase, period, and synchronization of SFR rhythms were examined. Loss of BK channels increased arrhythmicity but also altered the amplitude and period of rhythmic activity. Unexpectedly, Kcnma1−/− SCNs showed increased variability in the timing of the daily SFR peak.
These results suggest that BK channels regulate multiple aspects of the circadian patterning of neuronal activity in the SCN. In addition, these data illustrate the characteristics of a disrupted SCN rhythm downstream of clock gene-mediated timekeeping and its relationship to behavioral rhythms.
We report a simple and rapid method for detecting additive genetic variance due to X-linked loci in the absence of marker data for this chromosome. We examined the interaction of this method with an established method for detecting mitochondrial linkage (another source of sex-asymmetric genetic covariance). When applied to data from the Collaborative Study on the Genetics of Alcoholism, this method found evidence of X-chromosomal linkage for one continuous trait (ntth1) and one discrete trait (SPENT). Evidence of mitochondrial contribution was found for one discrete trait (CRAVING) and three continuous traits (ln(CIGPKYR), ecb21, and tth1). Results for ntth1 suggest that methods that do not also allow for male-female heterogeneity in environmental variance may be overly conservative in detection of X-chromosomal effects.