As a first step toward understanding how rare variants contribute to risk for complex diseases, we sequenced 15,585 human protein-coding genes to an average median depth of 111× in 2440 individuals of European (n = 1351) and African (n = 1088) ancestry. We identified over 500,000 single-nucleotide variants (SNVs), the majority of which were rare (86% with a minor allele frequency less than 0.5%), previously unknown (82%), and population-specific (82%). On average, 2.3% of the 13,595 SNVs each person carried were predicted to affect protein function of ∼313 genes per genome, and ∼95.7% of SNVs predicted to be functionally important were rare. This excess of rare functional variants is due to the combined effects of explosive, recent accelerated population growth and weak purifying selection. Furthermore, we show that large sample sizes will be required to associate rare variants with complex traits.
Establishing the age of each mutation segregating in contemporary human populations is important to fully understand our evolutionary history1,2 and will help facilitate the development of new approaches for disease gene discovery3. Large-scale surveys of human genetic variation have reported signatures of recent explosive population growth4-6, notable for an excess of rare genetic variants, qualitatively suggesting that many mutations arose recently. To more quantitatively assess the distribution of mutation ages, we resequenced 15,336 genes in 6,515 individuals of European (n=4,298) and African (n=2,217) American ancestry and inferred the age of 1,146,401 autosomal single nucleotide variants (SNVs). We estimate that ~73% of all protein-coding SNVs and ~86% of SNVs predicted to be deleterious arose in the past 5,000-10,000 years. The average age of deleterious SNVs varied significantly across molecular pathways, and disease genes contained a significantly higher proportion of recently arisen deleterious SNVs compared to other genes. Furthermore, European Americans had an excess of deleterious variants in essential and Mendelian disease genes compared to African Americans, consistent with weaker purifying selection due to the out-of-Africa dispersal. Our results better delimit the historical details of human protein-coding variation, illustrate the profound effect recent human history has had on the burden of deleterious SNVs segregating in contemporary populations, and provides important practical information that can be used to prioritize variants in disease gene discovery.
Toll-like receptor (TLR)-mediated innate immune responses are important in early host defense. Using a candidate gene approach, we previously identified genetic variation within TLR1 that is associated with hyper-responsiveness to a TLR1/2 agonist in vitro and with death and organ dysfunction in patients with sepsis. Here we report a genome-wide association study designed to identify genetic loci controlling whole blood cytokine responses to the TLR1/2 lipopeptide agonist, Pam3CSK4
ex vivo. We identified a very strong association (p<1×10−27) between genetic variation within the TLR10/1/6 locus on chromosome 4, and Pam3CSK4-induced cytokine responses. This was the predominant association explaining over 35% of the population variance for this phenotype. Notably, strong associations were observed within TLR10 suggesting genetic variation in TLR10 may influence bacterial lipoprotein-induced responses. These findings establish the TLR10/1/6 locus as the dominant common genetic factor controlling inter-individual variability in Pam3CSK4-induced whole blood responses in the healthy population.
TLR; polymorphism; genomics; innate immunity
Familial dyskinesia with facial myokymia (FDFM) is an autosomal dominant disorder that is exacerbated by anxiety. In a five-generation family of German ancestry we previously mapped FDFM to chromosome 3p21-3q21. The 72.5 Mbp linkage region was too large for traditional positional mutation identification.
To identify the gene responsible for FDFM by exome resequencing of a single affected individual.
Design, Setting and Participants
We performed whole exome sequencing in one affected individual and used a series of bioinformatic filters, including functional significance and presence in dbSNP or 1000 Genomes project, to reduce the number of candidate variants. Co-segregation analysis was performed in 15 additional individuals in three generations.
The exome contained 23428 single nucleotide variants, of which 9391 were missense, nonsense or splice site alterations. The critical region contained 323 variants, five of which were not present in one of the sequence-databases. Adenylate cyclase 5 (ADCY5) was the only gene in which the variant (c.2176G>A) was co-transmitted perfectly with disease status and was not present in 3510 control Caucasian exomes. This residue is highly conserved and the change is nonconservative and predicted to be damaging.
ADCY5 is highly expressed in striatum. Mice deficient in Adcy5 develop a movement disorder that is worsened by stress. We conclude that FDFM likely results from a missense mutation in ADCY5. This study demonstrates the power of a single exome sequence in combination with linkage information to identify causative genes for rare autosomal dominant Mendelian diseases.
Human exome sequencing is a recently developed tool to aid in the discovery of novel coding variants. Now broadly applied, exome sequencing datasets provide a novel opportunity to evaluate the allele frequencies of previously published pathogenic rare variants.
Methods and Results
We examined the exome dataset from the NHLBI Exome Sequencing Project (ESP) and compared this dataset with a catalog of 197 previously published rare variants reported as causative of dilated cardiomyopathy (DCM) from familial and sporadic cases. Of these 197, 33 (16.8%) were also present in the ESP database, raising the question of whether they were uncommon polymorphisms. Supporting functional data has been published for 14 of the 33 (42%), suggesting they are unlikely to be false positives. The frequencies of these functional variants in the ESP dataset ranged from 0.02–1.33% (median 0.04%), which when applied as a cut-off to filter variants in a DCM pedigree identified an additional DCM candidate gene. A greater proportion of sporadic DCM cases had variants that were present in the ESP dataset vs novel variants (i.e. not in ESP; 44% vs 21%), p=0.002), suggesting some of the variants identified as disease causing in sporadic DCM are either false positives or low penetrance alleles in human populations.
Rare nonsynonymous variants identified in DCM subjects also present at very low frequencies in public databases are likely relevant for DCM. Allele frequencies >0.04% are of less certain pathogenicity, especially if indentified in sporadic cases, although this cut-off should be viewed as preliminary.
cardiomyopathy; genetics; genes
It is well established that autism spectrum disorders (ASD) have a strong genetic component. However, for at least 70% of cases, the underlying genetic cause is unknown1. Under the hypothesis that de novo mutations underlie a substantial fraction of the risk for developing ASD in families with no previous history of ASD or related phenotypes—so-called sporadic or simplex families2,3, we sequenced all coding regions of the genome, i.e. the exome, for parent-child trios exhibiting sporadic ASD, including 189 new trios and 20 previously reported4. Additionally, we also sequenced the exomes of 50 unaffected siblings corresponding to these new (n = 31) and previously reported trios (n = 19)4, for a total of 677 individual exomes from 209 families. Here we show de novo point mutations are overwhelmingly paternal in origin (4:1 bias) and positively correlated with paternal age, consistent with the modest increased risk for children of older fathers to develop ASD5. Moreover, 39% (49/126) of the most severe or disruptive de novo mutations map to a highly interconnected beta-catenin/chromatin remodeling protein network ranked significantly for autism candidate genes. In proband exomes, recurrent protein-altering mutations were observed in two genes, CHD8 and NTNG1. Mutation screening of six candidate genes in 1,703 ASD probands identified additional de novo, protein-altering mutations in GRIN2B, LAMC3, and SCN1A. Combined with copy number variant (CNV) data, these results suggest extreme locus heterogeneity but also provide a target for future discovery, diagnostics, and therapeutics.
Adverse neurodevelopmental sequelae are reported among children who undergo early cardiac surgery to repair congenital heart defects (CHD). APOE genotype has previously been determined to contribute to the prediction of these outcomes. Understanding further genetic causes for the development of poor neurobehavioral outcomes should enhance patient risk stratification and improve both prevention and treatment strategies.
We performed a prospective observational study of children who underwent cardiac surgery before six months of age; this included a neurodevelopmental evaluation between their fourth and fifth birthdays. Attention and behavioral skills were assessed through parental report utilizing the Attention Deficit-Hyperactivity Disorder-IV scale preschool edition (ADHD-IV), and Child Behavior Checklist (CBCL/1.5-5), respectively. Of the seven investigated, three neurodevelopmental phenotypes met genomic quality control criteria. Linear regression was performed to determine the effect of genome-wide genetic variation on these three neurodevelopmental measures in 316 subjects.
This genome-wide association study identified single nucleotide polymorphisms (SNPs) associated with three neurobehavioral phenotypes in the postoperative children ADHD-IV Impulsivity/Hyperactivity, CBCL/1.5-5 PDPs, and CBCL/1.5-5 Total Problems. The most predictive SNPs for each phenotype were: a LGALS8 intronic SNP, rs4659682, associated with ADHD-IV Impulsivity (P = 1.03×10−6); a PCSK5 intronic SNP, rs2261722, associated with CBCL/1.5-5 PDPs (P = 1.11×10−6); and an intergenic SNP, rs11617488, 50 kb from FGF9, associated with CBCL/1.5-5 Total Problems (P = 3.47×10−7). 10 SNPs (3 for ADHD-IV Impulsivity, 5 for CBCL/1.5-5 PDPs, and 2 for CBCL/1.5-5 Total Problems) had p<10−5.
No SNPs met genome-wide significance for our three neurobehavioral phenotypes; however, 10 SNPs reached a threshold for suggestive significance (p<10−5). Given the unique nature of this cohort, larger studies and/or replication are not possible. Studies to further investigate the mechanisms through which these newly identified genes may influence neurodevelopment dysfunction are warranted.
Polymorphisms within the ICAM1 structural gene have been shown to influence circulating levels of soluble intercellular adhesion molecule -1 (sICAM-1) but their relation to atherosclerosis has not been clearly established. We sought to determine whether ICAM1 SNPs are associated with circulating sICAM-1 concentration, coronary artery calcium (CAC), and common and internal carotid intima medial thickness (IMT).
Methods and Results
3,550 black and white Coronary Artery Risk Development in Young Adults (CARDIA) Study subjects who participated in the year 15 and/or 20 examinations and were part of the Young Adult Longitudinal Study of Antioxidants (YALTA) ancillary study were included in this analysis. In whites, rs5498 was significantly associated with sICAM-1 (p < 0.001) and each G-allele of rs5498 was associated with 5% higher sICAM-1 concentration. In blacks, each C-allele of rs5490 was associated with 6 % higher sICAM-1 level; this SNP was in strong linkage disequilibrium with rs5491, a functional variant. Subclinical measurements of atherosclerosis in either year 15 or year 20 were not significantly related to ICAM1 SNPs.
In CARDIA, ICAM1 DNA segment variants were associated with sICAM-1 protein level including the novel finding that levels differ by the functional variant rs5491. However, ICAM1 SNPs were not strongly related to either IMT or CAC. Our findings in CARDIA suggest that ICAM1 variants are not major early contributors to subclinical atherosclerosis.
cell adhesion molecules; atherosclerosis; coronary calcium; genetics; inflammation
Kabuki syndrome is a rare, multiple malformation disorder characterized by a distinctive facial appearance, cardiac anomalies, skeletal abnormalities, and mild to moderate intellectual disability. Simplex cases make up the vast majority of the reported cases with Kabuki syndrome, but parent-to-child transmission in more than a half-dozen instances indicates that it is an autosomal dominant disorder. We recently reported that Kabuki syndrome is caused by mutations in MLL2, a gene that encodes a Trithorax-group histone methyltransferase, a protein important in the epigenetic control of active chromatin states. Here, we report on the screening of 110 families with Kabuki syndrome. MLL2 mutations were found in 81/110 (74%) of families. In simplex cases for which DNA was available from both parents, 25 mutations were confirmed to be de novo, while a transmitted MLL2 mutation was found in two of three familial cases. The majority of variants found to cause Kabuki syndrome were novel nonsense or frameshift mutations that are predicted to result in haploinsufficiency. The clinical characteristics of MLL2 mutation-positive cases did not differ significantly from MLL2 mutation-negative cases with the exception that renal anomalies were more common in MLL2 mutation-positive cases. These results are important for understanding the phenotypic consequences of MLL2 mutations for individuals and their families as well as for providing a basis for the identification of additional genes for Kabuki syndrome.
Kabuki syndrome; MLL2; ALR; Trithorax group histone methyltransferase
Structural variations in the chromosome 22q11.2 region mediated by non-allelic homologous recombination result in 22q11.2 deletion (del22q11.2) and 22q11.2 duplication (dup22q11.2) syndromes. The majority of del22q11.2 cases have facial and cardiac malformations, immunologic impairments, specific cognitive profile and increased risk for schizophrenia and autism spectrum disorders. The phenotype of dup22q11.2 is frequently without physical features but includes the spectrum of neurocognitive abnormalities. Although there is substantial evidence that haploinsufficiency for TBX1 plays a role in the physical features of del22q11.2, it is not known which gene(s) in the critical 1.5 Mb region are responsible for the observed spectrum of behavioral phenotypes. We identified an individual with a balanced translocation 46,XY,t(1;22)(p36.1;q11.2) and a behavioral phenotype characterized by cognitive impairment, autism and schizophrenia in the absence of congenital malformations. Using somatic cell hybrids and comparative genomic hybridization we mapped the chromosome-22 breakpoint within intron 7 of the GNB1L gene. Copy number evaluations and direct DNA sequencing of GNB1L in 271 schizophrenia and 513 autism cases revealed dup22q11.2 in two families with autism and private GNB1L missense variants in conserved residues in three families (p=0.036). The identified missense variants affect residues in the WD40 repeat domains and are predicted to have deleterious effects on the protein. Prior studies provided evidence that GNB1L may have a role in schizophrenia. Our findings support involvement of GNB1L in autism spectrum disorders as well.
22q11.2; translocation; neurodevelopmental disorders
Evidence for the etiology of autism spectrum disorders (ASD) has consistently pointed to a strong genetic component complicated by substantial locus heterogeneity1,2. We sequenced the exomes of 20 sporadic cases of ASD and their parents, reasoning that these families would be enriched for de novo mutations of major effect. We identified 21 de novo mutations, of which 11 were protein-altering. Protein-altering mutations were significantly enriched for changes at highly conserved residues. We identified potentially causative de novo events in 4/20 probands, particularly among more severely affected individuals, in FOXP1, GRIN2B, SCN1A, and LAMC3. In the FOXP1 mutation carrier, we also observed a rare inherited CNTNAP2 mutation and provide functional support for a multihit model for disease risk3. Our results demonstrate that trio-based exome sequencing is a powerful approach for identifying novel candidate genes for ASD and suggest that de novo mutations may contribute substantially to the genetic risk for ASD.
Massively parallel sequencing has enabled the rapid, systematic identification of variants on a large scale. This has, in turn, accelerated the pace of gene discovery and disease diagnosis on a molecular level and has the potential to revolutionize methods particularly for the analysis of Mendelian disease. Using massively parallel sequencing has enabled investigators to interrogate variants both in the context of linkage intervals and also on a genome-wide scale, in the absence of linkage information entirely. The primary challenge now is to distinguish between background polymorphisms and pathogenic mutations. Recently developed strategies for rare monogenic disorders have met with some early success. These strategies include filtering for potential causal variants based on frequency and function, and also ranking variants based on conservation scores and predicted deleteriousness to protein structure. Here, we review the recent literature in the use of high-throughput sequence data and its analysis in the discovery of causal mutations for rare disorders.
Complement factor H shows very strong association with Age-related Macular Degeneration (AMD), and recent data suggest that multiple causal variants are associated with disease. To refine the location of the disease associated variants, we characterized in detail the structural variation at CFH and its paralogs, including two copy number polymorphisms (CNP), CNP147 and CNP148, and several rare deletions and duplications. Examination of 34 AMD-enriched extended families (N = 293) and AMD cases (White N = 4210 Indian = 134; Malay = 140) and controls (White N = 3229; Indian = 117; Malay = 2390) demonstrated that deletion CNP148 was protective against AMD, independent of SNPs at CFH. Regression analysis of seven common haplotypes showed three haplotypes, H1, H6 and H7, as conferring risk for AMD development. Being the most common haplotype H1 confers the greatest risk by increasing the odds of AMD by 2.75-fold (95% CI = [2.51, 3.01]; p = 8.31×10−109); Caucasian (H6) and Indian-specific (H7) recombinant haplotypes increase the odds of AMD by 1.85-fold (p = 3.52×10−9) and by 15.57-fold (P = 0.007), respectively. We identified a 32-kb region downstream of Y402H (rs1061170), shared by all three risk haplotypes, suggesting that this region may be critical for AMD development. Further analysis showed that two SNPs within the 32 kb block, rs1329428 and rs203687, optimally explain disease association. rs1329428 resides in 20 kb unique sequence block, but rs203687 resides in a 12 kb block that is 89% similar to a noncoding region contained in ΔCNP148. We conclude that causal variation in this region potentially encompasses both regulatory effects at single markers and copy number.
Self-identified race or ethnic group is used to determine normal reference standards in the prediction of pulmonary function. We conducted a study to determine whether the genetically determined percentage of African ancestry is associated with lung function and whether its use could improve predictions of lung function among persons who identified themselves as African American.
We assessed the ancestry of 777 participants self-identified as African American in the Coronary Artery Risk Development in Young Adults (CARDIA) study and evaluated the relation between pulmonary function and ancestry by means of linear regression. We performed similar analyses of data for two independent cohorts of subjects identifying themselves as African American: 813 participants in the Health, Aging, and Body Composition (HABC) study and 579 participants in the Cardiovascular Health Study (CHS). We compared the fit of two types of models to lung-function measurements: models based on the covariates used in standard prediction equations and models incorporating ancestry. We also evaluated the effect of the ancestry-based models on the classification of disease severity in two asthma-study populations.
African ancestry was inversely related to forced expiratory volume in 1 second (FEV1) and forced vital capacity in the CARDIA cohort. These relations were also seen in the HABC and CHS cohorts. In predicting lung function, the ancestry-based model fit the data better than standard models. Ancestry-based models resulted in the reclassification of asthma severity (based on the percentage of the predicted FEV1) in 4 to 5% of participants.
Current predictive equations, which rely on self-identified race alone, may misestimate lung function among subjects who identify themselves as African American. Incorporating ancestry into normative equations may improve lung-function estimates and more accurately categorize disease severity. (Funded by the National Institutes of Health and others.)
Although statins are efficacious for lowering LDL-cholesterol (LDLC), there is wide inter-individual variation in response. We tested the extent to which combined effects of common alleles of LDLR and HMGCR can contribute to this variability.
Methods and Results
Haplotypes in the LDLR 3′-untranslated region (3UTR) were tested for association with lipid-lowering response to simvastatin treatment in the Cholesterol and Pharmacogenetics (CAP) trial (335 African-Americans and 609European-Americans). LDLR haplotype 5 (L5)was associated with smaller simvastatin-induced reductions in LDLC, total cholesterol, non-HDL cholesterol, and apolipoprotein B (P=0.0002–0.03)in African-Americans, but not European-Americans. The combined presence of L5 and previously described HMGCR haplotypes in African-Americans was associated with significantly attenuated apoB reduction(−22.4±1.5% N=89) both compared to noncarriers (−30.6±1.5% N=78, P=0.0001) and to carriers of either individual haplotype (−28.2±1.1% N=158, P=0.001). We observed similar differences when measuring simvastatin-mediated induction of LDLR surface expression using lymphoblast cell lines (P=0.03).
We have identified a common LDLR 3UTR haplotype that is associated with attenuated lipid-lowering response to simvastatin treatment. Response was further reduced in individuals with both LDLR and previously described HMGCR haplotypes. Previously identified racial differences in statin efficacy were partially explained by increased prevalence of these combined haplotypes in African-Americans.
LDLR; HMGCR; statin; LDL-cholesterol; pharmacogenomics
While whole genome resequencing remains expensive, genomic partitioning provides an affordable means of targeting sequence efforts towards regions of high interest. There are several competitive methods for targeted capture; these include molecular inversion probes, microdroplet-segregated multiplex PCR, and on-array or in-solution capture-by-hybridization. Enrichment of the human exome by array hybridization has been successfully applied to pinpoint the causative allele of Mendelian disorders. This protocol focuses on the application of Agilent 1M arrays for capture-by-hybridization and sequencing on the Illumina platform, though the library preparation method may be adaptable to other vendor’s array platforms and sequencing technologies.
Resequencing; exome; hybridization; targeted enrichment
The discovery of expression quantitative trait loci (“eQTLs”) can
help to unravel genetic contributions to complex traits. We identified genetic
determinants of human liver gene expression variation using two independent
collections of primary tissue profiled with Agilent
(n = 206) and Illumina (n = 60)
expression arrays and Illumina SNP genotyping (550K), and we also incorporated
data from a published study (n = 266). We found that
∼30% of SNP-expression correlations in one study failed to replicate
in either of the others, even at thresholds yielding high reproducibility in
simulations, and we quantified numerous factors affecting reproducibility. Our
data suggest that drug exposure, clinical descriptors, and unknown factors
associated with tissue ascertainment and analysis have substantial effects on
gene expression and that controlling for hidden confounding variables
significantly increases replication rate. Furthermore, we found that
reproducible eQTL SNPs were heavily enriched near gene starts and ends, and
subsequently resequenced the promoters and 3′UTRs for 14 genes and tested
the identified haplotypes using luciferase assays. For three genes, significant
haplotype-specific in vitro functional differences correlated
directly with expression levels, suggesting that many bona fide
eQTLs result from functional variants that can be mechanistically isolated in a
high-throughput fashion. Finally, given our study design, we were able to
discover and validate hundreds of liver eQTLs. Many of these relate directly to
complex traits for which liver-specific analyses are likely to be relevant, and
we identified dozens of potential connections with disease-associated loci.
These included previously characterized eQTL contributors to diabetes, drug
response, and lipid levels, and they suggest novel candidates such as a role for
NOD2 expression in leprosy risk and
C2orf43 in prostate cancer. In general, the work presented
here will be valuable for future efforts to precisely identify and functionally
characterize genetic contributions to a variety of complex traits.
Many disease-associated genetic variants do not alter protein sequences and are
difficult to precisely identify. Discovery of expression quantitative trait loci
(eQTL), or correlations between genetic variants and gene expression levels,
offers one means of addressing this challenge. However, eQTL studies in primary
cells have several shortcomings. In particular, their reproducibility is largely
unknown, the variables that generate unreliable associations are
uncharacterized, and the resolution of their findings is constrained by linkage
disequilibrium. We performed a three-way replication study of eQTLs in primary
human livers. We demonstrated that ∼67% of cis-eQTL associations are
replicated in an independent study and that known polymorphisms overlapping
expression probes, SNP-to-gene distance, and unmeasured confounding variables
all influence the replication rate. We fine-mapped 14 eQTLs and identified
causative polymorphisms in the promoter or 3′UTR for 3 genes, suggesting
that a considerable fraction of eQTLs are driven by proximal variants that are
amenable to functional isolation. Finally, we found hundreds of overlaps between
SNPs associated with complex traits and replicated eQTL SNPs. Our data provide
both cautionary (i.e. non-reproducibility of many strong eQTLs)
and optimistic (i.e. precise identification of functional
non-coding variants) forecasts for future eQTL analyses and the complex traits
that they influence.
We demonstrate the successful application of exome sequencing1–3 to discover a gene for an autosomal dominant disorder, Kabuki syndrome (OMIM %147920). The exomes of ten unrelated probands were subjected to massively parallel sequencing. After filtering against SNP databases, there was no compelling candidate gene containing novel variants in all affected individuals. Less stringent filtering criteria permitted modest genetic heterogeneity or missing data, but identified multiple candidate genes. However, genotypic and phenotypic stratification highlighted MLL2, a Trithorax-group histone methyltransferase4, in which seven probands had novel nonsense or frameshift mutations. Follow-up Sanger sequencing detected MLL2 mutations in two of the three remaining cases, and in 26 of 43 additional cases. In families where parental DNA was available, the mutation was confirmed to be de novo (n = 12) or transmitted (n = 2) in concordance with phenotype. Our results strongly suggest that mutations in MLL2 are a major cause of Kabuki syndrome.
The distribution of lipoprotein(a) [Lp(a)] levels can differ dramatically across diverse racial/ethnic populations. The extent to which genetic variation in LPA can explain these differences is not fully understood. To explore this, 19 LPA tagSNPs were genotyped in 7,159 participants from the Third National Health and Nutrition Examination Survey (NHANES III). NHANES III is a diverse population-based survey with DNA samples linked to hundreds of quantitative traits, including serum Lp(a). Tests of association between LPA variants and transformed Lp(a) levels were performed across the three different NHANES subpopulations (non-Hispanic whites, non-Hispanic blacks, and Mexican Americans). At a significance threshold of p<0.0001, 15 of the 19 SNPs tested were strongly associated with Lp(a) levels in at least one subpopulation, six in at least two subpopulations, and none in all three subpopulations. In non-Hispanic whites, three variants were associated with Lp(a) levels, including previously known rs6919246 (p = 1.18×10−30). Additionally, 12 and 6 variants had significant associations in non-Hispanic blacks and Mexican Americans, respectively. The additive effects of these associated alleles explained up to 11% of the variance observed for Lp(a) levels in the different racial/ethnic populations. The findings reported here replicate previous candidate gene and genome-wide association studies for Lp(a) levels in European-descent populations and extend these findings to other populations. While we demonstrate that LPA is an important contributor to Lp(a) levels regardless of race/ethnicity, the lack of generalization of associations across all subpopulations suggests that specific LPA variants may be contributing to the observed Lp(a) between-population variance.
Signatures of natural selection occur throughout the human genome and can be detected at the sequence level. We have re-sequenced ABCE1, a host candidate gene essential for HIV-1 capsid assembly, in European- (n=23) and African-descent (Yoruban; n=24) reference populations for genetic variation discovery. We identified an excess of rare genetic variation in Yoruban samples, and the resulting Tajima’s D was low (−2.27). The trend of excess rare variation persisted in flanking candidate genes ANAPC10 and OTUD4, suggesting that this pattern of positive selection can be detected across the 184.5kb examined on chromosome 4. Because of ABCE1’s role in HIV-1 replication, we re-sequenced the candidate gene in three small cohorts of HIV-1-infected or resistant individuals. We were able to confirm the excess of rare genetic variation among HIV-1 positive African-American individuals (n=53; Tajima’s D = −2.34). These results highlight the potential importance of ABCE1’s role in infectious diseases such as HIV-1.
ABCE1; African-Americans; single nucleotide polymorphisms; HIV-1
We demonstrate the first successful application of exome sequencing to discover the gene for a rare, Mendelian disorder of unknown cause, Miller syndrome (OMIM %263750). For four affected individuals in three independent kindreds, we captured and sequenced coding regions to a mean coverage of 40X, and sufficient depth to call variants at ~97% of each targeted exome. Filtering against public SNP databases and a small number of HapMap exomes for genes with two novel variants in each of the four cases identified a single candidate gene, DHODH, which encodes a key enzyme in the pyrimidine de novo biosynthesis pathway. Sanger sequencing confirmed the presence of DHODH mutations in three additional families with Miller syndrome. Exome sequencing of a small number of unrelated, affected individuals is a powerful, efficient strategy for identifying the genes underlying rare Mendelian disorders and will likely transform the genetic analysis of monogenic traits.
Genome-wide association studies suggest that common genetic variants explain only a small fraction of heritable risk for common diseases, raising the question of whether rare variants account for a significant fraction of unexplained heritability1,2. While DNA sequencing costs have fallen dramatically3, they remain far from what is necessary for rare and novel variants to be routinely identified at a genome-wide scale in large cohorts. We have therefore sought to develop second-generation methods for targeted sequencing of all protein-coding regions (`exomes'), to reduce costs while enriching for discovery of highly penetrant variants. Here we report on the targeted capture and massively parallel sequencing of the exomes of twelve humans. These include eight HapMap individuals representing three populations4, and four unrelated individuals with a rare dominantly inherited disorder, Freeman-Sheldon syndrome (FSS)5. We demonstrate the sensitive and specific identification of rare and common variants in over 300 megabases (Mb) of coding sequence. Using FSS as a proof-of-concept, we show that candidate genes for monogenic disorders can be identified by exome sequencing of a small number of unrelated, affected individuals. This strategy may be extendable to diseases with more complex genetics through larger sample sizes and appropriate weighting of nonsynonymous variants by predicted functional impact.
Statins effectively lower total and plasma LDL-cholesterol, but the magnitude of decrease varies among individuals. To identify single nucleotide polymorphisms (SNPs) contributing to this variation, we performed a combined analysis of genome-wide association (GWA) results from three trials of statin efficacy.
Methods and Principal Findings
Bayesian and standard frequentist association analyses were performed on untreated and statin-mediated changes in LDL-cholesterol, total cholesterol, HDL-cholesterol, and triglyceride on a total of 3932 subjects using data from three studies: Cholesterol and Pharmacogenetics (40 mg/day simvastatin, 6 weeks), Pravastatin/Inflammation CRP Evaluation (40 mg/day pravastatin, 24 weeks), and Treating to New Targets (10 mg/day atorvastatin, 8 weeks). Genotype imputation was used to maximize genomic coverage and to combine information across studies. Phenotypes were normalized within each study to account for systematic differences among studies, and fixed-effects combined analysis of the combined sample were performed to detect consistent effects across studies. Two SNP associations were assessed as having posterior probability greater than 50%, indicating that they were more likely than not to be genuinely associated with statin-mediated lipid response. SNP rs8014194, located within the CLMN gene on chromosome 14, was strongly associated with statin-mediated change in total cholesterol with an 84% probability by Bayesian analysis, and a p-value exceeding conventional levels of genome-wide significance by frequentist analysis (P = 1.8×10−8). This SNP was less significantly associated with change in LDL-cholesterol (posterior probability = 0.16, P = 4.0×10−6). Bayesian analysis also assigned a 51% probability that rs4420638, located in APOC1 and near APOE, was associated with change in LDL-cholesterol.
Conclusions and Significance
Using combined GWA analysis from three clinical trials involving nearly 4,000 individuals treated with simvastatin, pravastatin, or atorvastatin, we have identified SNPs that may be associated with variation in the magnitude of statin-mediated reduction in total and LDL-cholesterol, including one in the CLMN gene for which statistical evidence for association exceeds conventional levels of genome-wide significance.
PRINCE and TNT are not registered. CAP is registered at Clinicaltrials.gov NCT00451828