Familial dilated cardiomyopathy is a genetically heterogeneous disease with >30 known genes. TTN truncating variants were recently implicated in a candidate gene study to cause 25% of familial and 18% of sporadic dilated cardiomyopathy (DCM) cases.
Methods and Results
We used an unbiased genome-wide approach employing both linkage analysis and variant filtering across the exome sequences of 48 individuals affected with DCM from 17 families to identify genetic cause. Linkage analysis ranked the TTN region as falling under the second highest genome-wide multipoint linkage peak, MLOD 1.59. We identified six TTN truncating variants carried by affected with DCM in 7 of 17 DCM families (LOD 2.99); 2 of these 7 families also had novel missense variants segregated with disease. Two additional novel truncating TTN variants did not segregate with DCM. Nucleotide diversity at the TTN locus, including missense variants, was comparable to five other known DCM genes. The average number of missense variants in the exome sequences from the DCM cases or the ~5,400 cases from the Exome Sequencing Project was ~23 per individual. The average number of TTN truncating variants in the Exome Sequencing Project was 0.014 per individual. We also identified a region (chr9q21.11-q22.31) with no known DCM genes with a maximum heterogeneity LOD score of 1.74.
These data suggest that TTN truncating variants contribute to DCM cause. However, the lack of segregation of all identified TTN truncating variants illustrates the challenge of determining variant pathogenicity even with full exome sequencing.
genetics; human; genome-wide analysis; dilated cardiomyopathy; exome
Efforts to define the genetic architecture underlying variable statin response have met with limited success possibly because previous studies were limited to effect based on one-single-dose. We leveraged electronic medical records (EMRs) to extract potency (ED50) and efficacy (Emax) of statin dose-response curves and tested them for association with 144 pre-selected variants. Two large biobanks were used to construct dose-response curves for 2,026 (simvastatin) and 2,252 subjects (atorvastatin). Atorvastatin was more efficacious, more potent, and demonstrated less inter-individual variability than simvastatin. A pharmacodynamic variant emerging from randomized trials (PRDM16) was associated with Emax for both. For atorvastatin, Emax was 51.7 mg/dl in homozygous for the minor allele versus 75.0 mg/dl for those homozygous for the major allele. We also identified several loci associated with ED50. The extraction of rigorously defined traits from EMRs for pharmacogenetic studies represents a promising approach to further understand of genetic factors contributing to drug response.
Marked prolongation of the QT interval and polymorphic ventricular tachycardia following medication (drug-induced long QT syndrome, diLQTS) is a severe adverse drug reaction (ADR) that phenocopies congenital long QT syndrome (cLQTS) and one of the leading causes for drug withdrawal and relabeling. We evaluated the frequency of rare non-synonymous variants in genes contributing to the maintenance of heart rhythm in cases of diLQTS using targeted capture coupled to next generation sequencing. Eleven of 31 diLQTS subjects (36%) carried a novel missense mutation in genes with known congenital arrhythmia associations or a known cLQTS mutation. In the 26 Caucasian subjects, 23% carried a highly conserved rare variant predicted to be deleterious to protein function in these genes compared with only 2-4% in public databases (p < 0.003). We conclude that rare variation in genes responsible for congenital arrhythmia syndromes is frequent in diLQTS. Our findings demonstrate that diLQTS is a pharmacogenomic syndrome predisposed by rare genetic variants.
pharmacogenomics; sudden cardiac death; adverse drug reaction; next generation sequencing
Many statistical analyses of genetic data rely on the assumption of independence among samples. Consequently, relatedness is either modeled in the analysis or samples are removed to “clean” the data of any pairwise relatedness above a tolerated threshold. Current methods do not maximize the number of unrelated individuals retained for further analysis, and this is a needless loss of resources. We report a novel application of graph theory that identifies the maximum set of unrelated samples in any dataset given a user-defined threshold of relatedness as well as all networks of related samples. We have implemented this method into an open source program called Pedigree Reconstruction and Identification of a Maximum Unrelated Set, PRIMUS. We show that PRIMUS outperforms the three existing methods, allowing researchers to retain up to 50% more unrelated samples. A unique strength of PRIMUS is its ability to weight the maximum clique selection using additional criteria (e.g. affected status and data missingness). PRIMUS is a permanent solution to identifying the maximum number of unrelated samples for a genetic analysis.
genome-wide association study; Bron–Kerbosch; cryptic relatedness; bioinformatics; sample selection
Heterozygous mutations in the EFTUD2 were identified in 12 individuals with a rare sporadic craniofacial condition termed Mandibulofacial dysostosis with microcephaly (MIM 610536). We present clinical and radiographic features of three additional patients with de novo heterozygous mutations in EFTUD2.. Although clinical features overlap with findings of the original report (choanal atresia, cleft palate, maxillary and mandibular hypoplasia, and microtia), microcephaly was present in two of three patients and cognitive impairment was milder in those with head circumference proportional to height. Our cases expand the phenotypic spectrum to include epibulbar dermoids and zygomatic arch clefting. We suggest that craniofacial computed tomography studies to assess cleft of zygomatic arch may assist in making this diagnosis. We recommend consideration of EFTUD2 testing in individuals with features of oculo-auriculo-vertebral spectrum and bilateral microtia, or individuals with atypical CHARGE syndrome who do not have a CHD7 mutation, particularly those with a zygomatic arch cleft. The absence of microcephaly in one patient indicates that it is a highly variable phenotypic feature.
craniofacial development; EFTUD2; epibulbar dermoid; craniofacial microsomia; oculo-auriculo-vertebral spectrum (OAVS); choanal atresia
As a first step toward understanding how rare variants contribute to risk for complex diseases, we sequenced 15,585 human protein-coding genes to an average median depth of 111× in 2440 individuals of European (n = 1351) and African (n = 1088) ancestry. We identified over 500,000 single-nucleotide variants (SNVs), the majority of which were rare (86% with a minor allele frequency less than 0.5%), previously unknown (82%), and population-specific (82%). On average, 2.3% of the 13,595 SNVs each person carried were predicted to affect protein function of ∼313 genes per genome, and ∼95.7% of SNVs predicted to be functionally important were rare. This excess of rare functional variants is due to the combined effects of explosive, recent accelerated population growth and weak purifying selection. Furthermore, we show that large sample sizes will be required to associate rare variants with complex traits.
Establishing the age of each mutation segregating in contemporary human populations is important to fully understand our evolutionary history1,2 and will help facilitate the development of new approaches for disease gene discovery3. Large-scale surveys of human genetic variation have reported signatures of recent explosive population growth4-6, notable for an excess of rare genetic variants, qualitatively suggesting that many mutations arose recently. To more quantitatively assess the distribution of mutation ages, we resequenced 15,336 genes in 6,515 individuals of European (n=4,298) and African (n=2,217) American ancestry and inferred the age of 1,146,401 autosomal single nucleotide variants (SNVs). We estimate that ~73% of all protein-coding SNVs and ~86% of SNVs predicted to be deleterious arose in the past 5,000-10,000 years. The average age of deleterious SNVs varied significantly across molecular pathways, and disease genes contained a significantly higher proportion of recently arisen deleterious SNVs compared to other genes. Furthermore, European Americans had an excess of deleterious variants in essential and Mendelian disease genes compared to African Americans, consistent with weaker purifying selection due to the out-of-Africa dispersal. Our results better delimit the historical details of human protein-coding variation, illustrate the profound effect recent human history has had on the burden of deleterious SNVs segregating in contemporary populations, and provides important practical information that can be used to prioritize variants in disease gene discovery.
Toll-like receptor (TLR)-mediated innate immune responses are important in early host defense. Using a candidate gene approach, we previously identified genetic variation within TLR1 that is associated with hyper-responsiveness to a TLR1/2 agonist in vitro and with death and organ dysfunction in patients with sepsis. Here we report a genome-wide association study designed to identify genetic loci controlling whole blood cytokine responses to the TLR1/2 lipopeptide agonist, Pam3CSK4
ex vivo. We identified a very strong association (p<1×10−27) between genetic variation within the TLR10/1/6 locus on chromosome 4, and Pam3CSK4-induced cytokine responses. This was the predominant association explaining over 35% of the population variance for this phenotype. Notably, strong associations were observed within TLR10 suggesting genetic variation in TLR10 may influence bacterial lipoprotein-induced responses. These findings establish the TLR10/1/6 locus as the dominant common genetic factor controlling inter-individual variability in Pam3CSK4-induced whole blood responses in the healthy population.
TLR; polymorphism; genomics; innate immunity
Familial dyskinesia with facial myokymia (FDFM) is an autosomal dominant disorder that is exacerbated by anxiety. In a five-generation family of German ancestry we previously mapped FDFM to chromosome 3p21-3q21. The 72.5 Mbp linkage region was too large for traditional positional mutation identification.
To identify the gene responsible for FDFM by exome resequencing of a single affected individual.
Design, Setting and Participants
We performed whole exome sequencing in one affected individual and used a series of bioinformatic filters, including functional significance and presence in dbSNP or 1000 Genomes project, to reduce the number of candidate variants. Co-segregation analysis was performed in 15 additional individuals in three generations.
The exome contained 23428 single nucleotide variants, of which 9391 were missense, nonsense or splice site alterations. The critical region contained 323 variants, five of which were not present in one of the sequence-databases. Adenylate cyclase 5 (ADCY5) was the only gene in which the variant (c.2176G>A) was co-transmitted perfectly with disease status and was not present in 3510 control Caucasian exomes. This residue is highly conserved and the change is nonconservative and predicted to be damaging.
ADCY5 is highly expressed in striatum. Mice deficient in Adcy5 develop a movement disorder that is worsened by stress. We conclude that FDFM likely results from a missense mutation in ADCY5. This study demonstrates the power of a single exome sequence in combination with linkage information to identify causative genes for rare autosomal dominant Mendelian diseases.
Human exome sequencing is a recently developed tool to aid in the discovery of novel coding variants. Now broadly applied, exome sequencing datasets provide a novel opportunity to evaluate the allele frequencies of previously published pathogenic rare variants.
Methods and Results
We examined the exome dataset from the NHLBI Exome Sequencing Project (ESP) and compared this dataset with a catalog of 197 previously published rare variants reported as causative of dilated cardiomyopathy (DCM) from familial and sporadic cases. Of these 197, 33 (16.8%) were also present in the ESP database, raising the question of whether they were uncommon polymorphisms. Supporting functional data has been published for 14 of the 33 (42%), suggesting they are unlikely to be false positives. The frequencies of these functional variants in the ESP dataset ranged from 0.02–1.33% (median 0.04%), which when applied as a cut-off to filter variants in a DCM pedigree identified an additional DCM candidate gene. A greater proportion of sporadic DCM cases had variants that were present in the ESP dataset vs novel variants (i.e. not in ESP; 44% vs 21%), p=0.002), suggesting some of the variants identified as disease causing in sporadic DCM are either false positives or low penetrance alleles in human populations.
Rare nonsynonymous variants identified in DCM subjects also present at very low frequencies in public databases are likely relevant for DCM. Allele frequencies >0.04% are of less certain pathogenicity, especially if indentified in sporadic cases, although this cut-off should be viewed as preliminary.
cardiomyopathy; genetics; genes
It is well established that autism spectrum disorders (ASD) have a strong genetic component. However, for at least 70% of cases, the underlying genetic cause is unknown1. Under the hypothesis that de novo mutations underlie a substantial fraction of the risk for developing ASD in families with no previous history of ASD or related phenotypes—so-called sporadic or simplex families2,3, we sequenced all coding regions of the genome, i.e. the exome, for parent-child trios exhibiting sporadic ASD, including 189 new trios and 20 previously reported4. Additionally, we also sequenced the exomes of 50 unaffected siblings corresponding to these new (n = 31) and previously reported trios (n = 19)4, for a total of 677 individual exomes from 209 families. Here we show de novo point mutations are overwhelmingly paternal in origin (4:1 bias) and positively correlated with paternal age, consistent with the modest increased risk for children of older fathers to develop ASD5. Moreover, 39% (49/126) of the most severe or disruptive de novo mutations map to a highly interconnected beta-catenin/chromatin remodeling protein network ranked significantly for autism candidate genes. In proband exomes, recurrent protein-altering mutations were observed in two genes, CHD8 and NTNG1. Mutation screening of six candidate genes in 1,703 ASD probands identified additional de novo, protein-altering mutations in GRIN2B, LAMC3, and SCN1A. Combined with copy number variant (CNV) data, these results suggest extreme locus heterogeneity but also provide a target for future discovery, diagnostics, and therapeutics.
Adverse neurodevelopmental sequelae are reported among children who undergo early cardiac surgery to repair congenital heart defects (CHD). APOE genotype has previously been determined to contribute to the prediction of these outcomes. Understanding further genetic causes for the development of poor neurobehavioral outcomes should enhance patient risk stratification and improve both prevention and treatment strategies.
We performed a prospective observational study of children who underwent cardiac surgery before six months of age; this included a neurodevelopmental evaluation between their fourth and fifth birthdays. Attention and behavioral skills were assessed through parental report utilizing the Attention Deficit-Hyperactivity Disorder-IV scale preschool edition (ADHD-IV), and Child Behavior Checklist (CBCL/1.5-5), respectively. Of the seven investigated, three neurodevelopmental phenotypes met genomic quality control criteria. Linear regression was performed to determine the effect of genome-wide genetic variation on these three neurodevelopmental measures in 316 subjects.
This genome-wide association study identified single nucleotide polymorphisms (SNPs) associated with three neurobehavioral phenotypes in the postoperative children ADHD-IV Impulsivity/Hyperactivity, CBCL/1.5-5 PDPs, and CBCL/1.5-5 Total Problems. The most predictive SNPs for each phenotype were: a LGALS8 intronic SNP, rs4659682, associated with ADHD-IV Impulsivity (P = 1.03×10−6); a PCSK5 intronic SNP, rs2261722, associated with CBCL/1.5-5 PDPs (P = 1.11×10−6); and an intergenic SNP, rs11617488, 50 kb from FGF9, associated with CBCL/1.5-5 Total Problems (P = 3.47×10−7). 10 SNPs (3 for ADHD-IV Impulsivity, 5 for CBCL/1.5-5 PDPs, and 2 for CBCL/1.5-5 Total Problems) had p<10−5.
No SNPs met genome-wide significance for our three neurobehavioral phenotypes; however, 10 SNPs reached a threshold for suggestive significance (p<10−5). Given the unique nature of this cohort, larger studies and/or replication are not possible. Studies to further investigate the mechanisms through which these newly identified genes may influence neurodevelopment dysfunction are warranted.
Polymorphisms within the ICAM1 structural gene have been shown to influence circulating levels of soluble intercellular adhesion molecule -1 (sICAM-1) but their relation to atherosclerosis has not been clearly established. We sought to determine whether ICAM1 SNPs are associated with circulating sICAM-1 concentration, coronary artery calcium (CAC), and common and internal carotid intima medial thickness (IMT).
Methods and Results
3,550 black and white Coronary Artery Risk Development in Young Adults (CARDIA) Study subjects who participated in the year 15 and/or 20 examinations and were part of the Young Adult Longitudinal Study of Antioxidants (YALTA) ancillary study were included in this analysis. In whites, rs5498 was significantly associated with sICAM-1 (p < 0.001) and each G-allele of rs5498 was associated with 5% higher sICAM-1 concentration. In blacks, each C-allele of rs5490 was associated with 6 % higher sICAM-1 level; this SNP was in strong linkage disequilibrium with rs5491, a functional variant. Subclinical measurements of atherosclerosis in either year 15 or year 20 were not significantly related to ICAM1 SNPs.
In CARDIA, ICAM1 DNA segment variants were associated with sICAM-1 protein level including the novel finding that levels differ by the functional variant rs5491. However, ICAM1 SNPs were not strongly related to either IMT or CAC. Our findings in CARDIA suggest that ICAM1 variants are not major early contributors to subclinical atherosclerosis.
cell adhesion molecules; atherosclerosis; coronary calcium; genetics; inflammation
Kabuki syndrome is a rare, multiple malformation disorder characterized by a distinctive facial appearance, cardiac anomalies, skeletal abnormalities, and mild to moderate intellectual disability. Simplex cases make up the vast majority of the reported cases with Kabuki syndrome, but parent-to-child transmission in more than a half-dozen instances indicates that it is an autosomal dominant disorder. We recently reported that Kabuki syndrome is caused by mutations in MLL2, a gene that encodes a Trithorax-group histone methyltransferase, a protein important in the epigenetic control of active chromatin states. Here, we report on the screening of 110 families with Kabuki syndrome. MLL2 mutations were found in 81/110 (74%) of families. In simplex cases for which DNA was available from both parents, 25 mutations were confirmed to be de novo, while a transmitted MLL2 mutation was found in two of three familial cases. The majority of variants found to cause Kabuki syndrome were novel nonsense or frameshift mutations that are predicted to result in haploinsufficiency. The clinical characteristics of MLL2 mutation-positive cases did not differ significantly from MLL2 mutation-negative cases with the exception that renal anomalies were more common in MLL2 mutation-positive cases. These results are important for understanding the phenotypic consequences of MLL2 mutations for individuals and their families as well as for providing a basis for the identification of additional genes for Kabuki syndrome.
Kabuki syndrome; MLL2; ALR; Trithorax group histone methyltransferase
Structural variations in the chromosome 22q11.2 region mediated by non-allelic homologous recombination result in 22q11.2 deletion (del22q11.2) and 22q11.2 duplication (dup22q11.2) syndromes. The majority of del22q11.2 cases have facial and cardiac malformations, immunologic impairments, specific cognitive profile and increased risk for schizophrenia and autism spectrum disorders. The phenotype of dup22q11.2 is frequently without physical features but includes the spectrum of neurocognitive abnormalities. Although there is substantial evidence that haploinsufficiency for TBX1 plays a role in the physical features of del22q11.2, it is not known which gene(s) in the critical 1.5 Mb region are responsible for the observed spectrum of behavioral phenotypes. We identified an individual with a balanced translocation 46,XY,t(1;22)(p36.1;q11.2) and a behavioral phenotype characterized by cognitive impairment, autism and schizophrenia in the absence of congenital malformations. Using somatic cell hybrids and comparative genomic hybridization we mapped the chromosome-22 breakpoint within intron 7 of the GNB1L gene. Copy number evaluations and direct DNA sequencing of GNB1L in 271 schizophrenia and 513 autism cases revealed dup22q11.2 in two families with autism and private GNB1L missense variants in conserved residues in three families (p=0.036). The identified missense variants affect residues in the WD40 repeat domains and are predicted to have deleterious effects on the protein. Prior studies provided evidence that GNB1L may have a role in schizophrenia. Our findings support involvement of GNB1L in autism spectrum disorders as well.
22q11.2; translocation; neurodevelopmental disorders
Evidence for the etiology of autism spectrum disorders (ASD) has consistently pointed to a strong genetic component complicated by substantial locus heterogeneity1,2. We sequenced the exomes of 20 sporadic cases of ASD and their parents, reasoning that these families would be enriched for de novo mutations of major effect. We identified 21 de novo mutations, of which 11 were protein-altering. Protein-altering mutations were significantly enriched for changes at highly conserved residues. We identified potentially causative de novo events in 4/20 probands, particularly among more severely affected individuals, in FOXP1, GRIN2B, SCN1A, and LAMC3. In the FOXP1 mutation carrier, we also observed a rare inherited CNTNAP2 mutation and provide functional support for a multihit model for disease risk3. Our results demonstrate that trio-based exome sequencing is a powerful approach for identifying novel candidate genes for ASD and suggest that de novo mutations may contribute substantially to the genetic risk for ASD.
Massively parallel sequencing has enabled the rapid, systematic identification of variants on a large scale. This has, in turn, accelerated the pace of gene discovery and disease diagnosis on a molecular level and has the potential to revolutionize methods particularly for the analysis of Mendelian disease. Using massively parallel sequencing has enabled investigators to interrogate variants both in the context of linkage intervals and also on a genome-wide scale, in the absence of linkage information entirely. The primary challenge now is to distinguish between background polymorphisms and pathogenic mutations. Recently developed strategies for rare monogenic disorders have met with some early success. These strategies include filtering for potential causal variants based on frequency and function, and also ranking variants based on conservation scores and predicted deleteriousness to protein structure. Here, we review the recent literature in the use of high-throughput sequence data and its analysis in the discovery of causal mutations for rare disorders.
Complement factor H shows very strong association with Age-related Macular Degeneration (AMD), and recent data suggest that multiple causal variants are associated with disease. To refine the location of the disease associated variants, we characterized in detail the structural variation at CFH and its paralogs, including two copy number polymorphisms (CNP), CNP147 and CNP148, and several rare deletions and duplications. Examination of 34 AMD-enriched extended families (N = 293) and AMD cases (White N = 4210 Indian = 134; Malay = 140) and controls (White N = 3229; Indian = 117; Malay = 2390) demonstrated that deletion CNP148 was protective against AMD, independent of SNPs at CFH. Regression analysis of seven common haplotypes showed three haplotypes, H1, H6 and H7, as conferring risk for AMD development. Being the most common haplotype H1 confers the greatest risk by increasing the odds of AMD by 2.75-fold (95% CI = [2.51, 3.01]; p = 8.31×10−109); Caucasian (H6) and Indian-specific (H7) recombinant haplotypes increase the odds of AMD by 1.85-fold (p = 3.52×10−9) and by 15.57-fold (P = 0.007), respectively. We identified a 32-kb region downstream of Y402H (rs1061170), shared by all three risk haplotypes, suggesting that this region may be critical for AMD development. Further analysis showed that two SNPs within the 32 kb block, rs1329428 and rs203687, optimally explain disease association. rs1329428 resides in 20 kb unique sequence block, but rs203687 resides in a 12 kb block that is 89% similar to a noncoding region contained in ΔCNP148. We conclude that causal variation in this region potentially encompasses both regulatory effects at single markers and copy number.
Self-identified race or ethnic group is used to determine normal reference standards in the prediction of pulmonary function. We conducted a study to determine whether the genetically determined percentage of African ancestry is associated with lung function and whether its use could improve predictions of lung function among persons who identified themselves as African American.
We assessed the ancestry of 777 participants self-identified as African American in the Coronary Artery Risk Development in Young Adults (CARDIA) study and evaluated the relation between pulmonary function and ancestry by means of linear regression. We performed similar analyses of data for two independent cohorts of subjects identifying themselves as African American: 813 participants in the Health, Aging, and Body Composition (HABC) study and 579 participants in the Cardiovascular Health Study (CHS). We compared the fit of two types of models to lung-function measurements: models based on the covariates used in standard prediction equations and models incorporating ancestry. We also evaluated the effect of the ancestry-based models on the classification of disease severity in two asthma-study populations.
African ancestry was inversely related to forced expiratory volume in 1 second (FEV1) and forced vital capacity in the CARDIA cohort. These relations were also seen in the HABC and CHS cohorts. In predicting lung function, the ancestry-based model fit the data better than standard models. Ancestry-based models resulted in the reclassification of asthma severity (based on the percentage of the predicted FEV1) in 4 to 5% of participants.
Current predictive equations, which rely on self-identified race alone, may misestimate lung function among subjects who identify themselves as African American. Incorporating ancestry into normative equations may improve lung-function estimates and more accurately categorize disease severity. (Funded by the National Institutes of Health and others.)
Although statins are efficacious for lowering LDL-cholesterol (LDLC), there is wide inter-individual variation in response. We tested the extent to which combined effects of common alleles of LDLR and HMGCR can contribute to this variability.
Methods and Results
Haplotypes in the LDLR 3′-untranslated region (3UTR) were tested for association with lipid-lowering response to simvastatin treatment in the Cholesterol and Pharmacogenetics (CAP) trial (335 African-Americans and 609European-Americans). LDLR haplotype 5 (L5)was associated with smaller simvastatin-induced reductions in LDLC, total cholesterol, non-HDL cholesterol, and apolipoprotein B (P=0.0002–0.03)in African-Americans, but not European-Americans. The combined presence of L5 and previously described HMGCR haplotypes in African-Americans was associated with significantly attenuated apoB reduction(−22.4±1.5% N=89) both compared to noncarriers (−30.6±1.5% N=78, P=0.0001) and to carriers of either individual haplotype (−28.2±1.1% N=158, P=0.001). We observed similar differences when measuring simvastatin-mediated induction of LDLR surface expression using lymphoblast cell lines (P=0.03).
We have identified a common LDLR 3UTR haplotype that is associated with attenuated lipid-lowering response to simvastatin treatment. Response was further reduced in individuals with both LDLR and previously described HMGCR haplotypes. Previously identified racial differences in statin efficacy were partially explained by increased prevalence of these combined haplotypes in African-Americans.
LDLR; HMGCR; statin; LDL-cholesterol; pharmacogenomics
While whole genome resequencing remains expensive, genomic partitioning provides an affordable means of targeting sequence efforts towards regions of high interest. There are several competitive methods for targeted capture; these include molecular inversion probes, microdroplet-segregated multiplex PCR, and on-array or in-solution capture-by-hybridization. Enrichment of the human exome by array hybridization has been successfully applied to pinpoint the causative allele of Mendelian disorders. This protocol focuses on the application of Agilent 1M arrays for capture-by-hybridization and sequencing on the Illumina platform, though the library preparation method may be adaptable to other vendor’s array platforms and sequencing technologies.
Resequencing; exome; hybridization; targeted enrichment
The discovery of expression quantitative trait loci (“eQTLs”) can
help to unravel genetic contributions to complex traits. We identified genetic
determinants of human liver gene expression variation using two independent
collections of primary tissue profiled with Agilent
(n = 206) and Illumina (n = 60)
expression arrays and Illumina SNP genotyping (550K), and we also incorporated
data from a published study (n = 266). We found that
∼30% of SNP-expression correlations in one study failed to replicate
in either of the others, even at thresholds yielding high reproducibility in
simulations, and we quantified numerous factors affecting reproducibility. Our
data suggest that drug exposure, clinical descriptors, and unknown factors
associated with tissue ascertainment and analysis have substantial effects on
gene expression and that controlling for hidden confounding variables
significantly increases replication rate. Furthermore, we found that
reproducible eQTL SNPs were heavily enriched near gene starts and ends, and
subsequently resequenced the promoters and 3′UTRs for 14 genes and tested
the identified haplotypes using luciferase assays. For three genes, significant
haplotype-specific in vitro functional differences correlated
directly with expression levels, suggesting that many bona fide
eQTLs result from functional variants that can be mechanistically isolated in a
high-throughput fashion. Finally, given our study design, we were able to
discover and validate hundreds of liver eQTLs. Many of these relate directly to
complex traits for which liver-specific analyses are likely to be relevant, and
we identified dozens of potential connections with disease-associated loci.
These included previously characterized eQTL contributors to diabetes, drug
response, and lipid levels, and they suggest novel candidates such as a role for
NOD2 expression in leprosy risk and
C2orf43 in prostate cancer. In general, the work presented
here will be valuable for future efforts to precisely identify and functionally
characterize genetic contributions to a variety of complex traits.
Many disease-associated genetic variants do not alter protein sequences and are
difficult to precisely identify. Discovery of expression quantitative trait loci
(eQTL), or correlations between genetic variants and gene expression levels,
offers one means of addressing this challenge. However, eQTL studies in primary
cells have several shortcomings. In particular, their reproducibility is largely
unknown, the variables that generate unreliable associations are
uncharacterized, and the resolution of their findings is constrained by linkage
disequilibrium. We performed a three-way replication study of eQTLs in primary
human livers. We demonstrated that ∼67% of cis-eQTL associations are
replicated in an independent study and that known polymorphisms overlapping
expression probes, SNP-to-gene distance, and unmeasured confounding variables
all influence the replication rate. We fine-mapped 14 eQTLs and identified
causative polymorphisms in the promoter or 3′UTR for 3 genes, suggesting
that a considerable fraction of eQTLs are driven by proximal variants that are
amenable to functional isolation. Finally, we found hundreds of overlaps between
SNPs associated with complex traits and replicated eQTL SNPs. Our data provide
both cautionary (i.e. non-reproducibility of many strong eQTLs)
and optimistic (i.e. precise identification of functional
non-coding variants) forecasts for future eQTL analyses and the complex traits
that they influence.
We demonstrate the successful application of exome sequencing1–3 to discover a gene for an autosomal dominant disorder, Kabuki syndrome (OMIM %147920). The exomes of ten unrelated probands were subjected to massively parallel sequencing. After filtering against SNP databases, there was no compelling candidate gene containing novel variants in all affected individuals. Less stringent filtering criteria permitted modest genetic heterogeneity or missing data, but identified multiple candidate genes. However, genotypic and phenotypic stratification highlighted MLL2, a Trithorax-group histone methyltransferase4, in which seven probands had novel nonsense or frameshift mutations. Follow-up Sanger sequencing detected MLL2 mutations in two of the three remaining cases, and in 26 of 43 additional cases. In families where parental DNA was available, the mutation was confirmed to be de novo (n = 12) or transmitted (n = 2) in concordance with phenotype. Our results strongly suggest that mutations in MLL2 are a major cause of Kabuki syndrome.