Although inversions have occasionally been found to be associated with disease susceptibility through interrupting a gene or its regulatory region, or by increasing the risk for deleterious secondary rearrangements, no association study has been specifically conducted for risks associated with inversions, mainly because existing approaches to detecting and genotyping inversions do not readily scale to a large number of samples. Based on our recently proposed approach to identifying and genotyping inversions using principal components analysis (PCA), we herein develop a method of detecting association between inversions and disease in a genome-wide fashion. Our method uses genotype data for single nucleotide polymorphisms (SNPs), and is thus cost-efficient and computationally fast. For an inversion polymorphism, local PCA around the inversion region is performed to infer the inversion genotypes of all samples. For many inversions, we found that some of the SNPs inside an inversion region are fixed in the two lineages of different orientations and thus can serve as surrogate markers. Our method can be applied to case-control and quantitative trait association studies to identify inversions that may interrupt a gene or the connection between a gene and its regulatory agents. Our method also offers a new venue to identify inversions that are responsible for disease-causing secondary rearrangements. We illustrated our proposed approach to case-control data for psoriasis and identified novel associations with a few inversion polymorphisms.
Chromosomal inversion; Principal components analysis; Genome-wide association scan; Single-Nucleotide Polymorphism; Psoriasis
Retinitis pigmentosa (RP) is a group of inherited retinal disorders characterized by progressive photoreceptor degeneration. An accurate molecular diagnosis is essential for disease characterization and clinical prognoses. A retinal capture panel that enriches 186 known retinal disease genes, including 55 known RP genes, was developed. Targeted next-generation sequencing was performed for a cohort of 82 unrelated RP cases from Northern Ireland, including 46 simplex cases and 36 familial cases. Disease-causing mutations were identified in 49 probands, including 28 simplex cases and 21 familial cases, achieving a solving rate of 60 %. In total, 65 pathogenic mutations were found, and 29 of these were novel. Interestingly, the molecular information of 12 probands was neither consistent with their initial inheritance pattern nor clinical diagnosis. Further clinical reassessment resulted in a refinement of the clinical diagnosis in 11 patients. This is the first study to apply next-generation sequencing-based, comprehensive molecular diagnoses to a large number of RP probands from Northern Ireland. Our study shows that molecular information can aid clinical diagnosis, potentially changing treatment options, current family counseling and management.
Recent genome-wide association studies of the adult human metabolome have identified genetic variants associated with relative levels of several acylcarnitines, which are important clinical correlates for chronic conditions such as type 2 diabetes and obesity. We have previously shown that these same metabolite levels are highly heritable at birth; however, no studies to our knowledge have examined genetic associations with these metabolites measured at birth. Here, we examine, in 743 newborns, 58 single nucleotide polymorphisms (SNPs) in 11 candidate genes previously associated with differing relative levels of short-chain acylcarnitines in adults. Six SNPs (rs2066938, rs3916, rs3794215, rs555404, rs558314, rs1799958) in the short chain acyl-CoA dehydrogenase gene (ACADS) were associated with neonatal C4 levels. Most significant was the G allele of rs2066938, which was associated with significantly higher levels of C4 (P=1.5×10−29). This SNP explains 25% of the variation in neonatal C4 levels, which is similar to the variation previously reported in adult C4 levels. There were also significant (P<1×10−4) associations between neonatal levels of C5-OH and SNPs in the solute carrier family 22 genes (SLC22A4 and SLC22A5) and the 3-methylcrotonyl-CoA carboxylase 1 gene (MCCC1). We have replicated, in newborns, SNP associations between metabolic traits and the ACADS and SLC22A4 genes observed in adults. This research has important implications not only for the identification of rare inborn errors of metabolism but also for personalized medicine and early detection of later life risks for chronic conditions.
short chain acylcarnitines; chronic disease; newborn screening; metabolic heritability
C-reactive protein (CRP) is a heritable biomarker of systemic inflammation and a predictor of cardiovascular disease (CVD). Large-scale genetic association studies for CRP have largely focused on individuals of European descent. We sought to uncover novel genetic variants for CRP in a multi-ethnic sample using the ITMAT Broad-CARe (IBC) array, a custom 50,000 SNP gene-centric array having dense coverage of over 2,000 candidate CVD genes. We performed analyses on 7570 African Americans (AA) from the Candidate gene Association Resource (CARe) study and race-combined meta-analyses that included 29,939 additional individuals of European descent from CARe, the Women’s Health Initiative (WHI) and KORA studies. We observed array-wide significance (p<2.2×10−6) for four loci in AA, three of which have been reported previously in individuals of European descent (IL6R, p=2.0×10−6; CRP, p=4.2×10−71; APOE, p=1.6×10−6). The fourth significant locus, CD36 (p=1.6×10−6), was observed at a functional variant (rs3211938) that is extremely rare in individuals of European descent. We replicated the CD36 finding (p=1.8×10−5) in an independent sample of 8041 AA women from WHI; a meta-analysis combining the CARe and WHI AA results at rs3211938 reached genome-wide significance (p=1.5×10−10). In the race-combined meta-analyses, 13 loci reached significance, including ten (CRP, TOMM40/APOE/APOC1, HNF1A, LEPR, GCKR, IL6R, IL1RN, NLRP3, HNF4A and BAZ1B/BCL7B) previously associated with CRP, and one (ARNTL) previously reported to be nominally associated with CRP. Two novel loci were also detected (RPS6KB1, p=2.0×10−6; CD36, p=1.4×10−6). These results highlight both shared and unique genetic risk factors for CRP in AA compared to populations of European descent.
C-reactive protein; Inflammation; Multi-ethnic; Candidate gene
Previous studies have implicated genes encoding the 5-HT3AB receptors (HTR3A and HTR3B) and the serotonin transporter (SLC6A4), both independently and interactively, in alcohol (AD), cocaine (CD), and nicotine dependence (ND). However, whether these genetic effects also exist in subjects with comorbidities remains largely unknown. We used 1,136 African-American (AA) and 2,428 European-American (EA) subjects from the Study of Addiction: Genetics and Environment (SAGE) to determine associations between 88 genotyped or imputed variants within HTR3A, HTR3B, and SLC6A4 and three types of addictions, which were measured by DSM-IV diagnoses of AD, CD, and ND and the Fagerström Test for Nicotine Dependence (FTND), an independent measure of ND commonly used in tobacco research. Individual SNP-based association analysis revealed a significant association of rs2066713 in SLC6A4 with FTND in AA (beta = -1.39; P = 1.6E-04). Haplotype-based association analysis found one major haplotype formed by SNPs rs3891484 and rs3758987 in HTR3B that was significantly associated with AD in the AA sample, and another major haplotype T-T-G, formed by SNPs rs7118530, rs12221649, and rs2085421 in HTR3A, that showed significant association with FTND in the EA sample. Considering the biologic roles of the three genes and their functional relations, we used the GPU-based Generalized Multifactor Dimensionality Reduction (GMDR-GPU) program to test SNP-by-SNP interactions within the three genes and discovered two- to five-variant models that have significant impacts on AD, CD, ND, or FTND. Interestingly, most of the SNPs included in the genetic interaction model(s) for each addictive phenotype are either overlapped or in high linkage disequilibrium for both AA and EA samples, suggesting these detected variants in HTR3A, HTR3B, and SLC6A4 are interactively contributing to etiology of the three addictive phenotypes examined in this study.
Epistasis; HTR3A; HTR3B; multiple addictions; gene-by-gene interaction
Mitochondrial DNA (mtDNA) haplogroups are valuable for investigations in forensic science, molecular anthropology, and human genetics. In this study, we developed a custom panel of 61 mtDNA markers for high-throughput classification of European, African, and Native American/Asian mitochondrial haplogroup lineages. Using these mtDNA markers we constructed a mitochondrial haplogroup classification tree and classified 18,832 participants from the National Health and Nutrition Examination Surveys (NHANES). To our knowledge, this is the largest study to date characterizing mitochondrial haplogroups in a population-based sample from the United States, and the first study characterizing mitochondrial haplogroup distributions in self-identified Mexican Americans separately from Hispanic Americans of other descent. We observed clear differences in the distribution of maternal genetic ancestry consistent with proposed admixture models for these subpopulations, underscoring the genetic heterogeneity of the United States Hispanic population. The mitochondrial haplogroup distributions in the other self-identified racial/ethnic groups within NHANES were largely comparable to previous studies. Mitochondrial haplogroup classification was highly concordant with self-identified race/ethnicity (SIRE) in non-Hispanic whites (94.8%), but was considerably lower in admixed populations including non-Hispanic blacks (88.3%), Mexican Americans (81.8%), and other Hispanics (61.6%), suggesting SIRE does not accurately reflect maternal genetic ancestry, particularly in populations with greater proportions of admixture. Thus, it is important to consider inconsistencies between SIRE and genetic ancestry when performing genetic association studies. The mitochondrial haplogroup data that we have generated, coupled with the epidemiologic variables in NHANES, is a valuable resource for future studies investigating the contribution of mtDNA variation to human health and disease.
mitochondrial haplogroups; NHANES; mitochondrial genetic variation; Sequenom; multiplex genotyping
UDP-glucuronosyltransferase 2 family, polypeptide B4 (UGT2B4) is an important metabolizing enzyme involved in the clearance of many xenobiotics and endogenous substrates, especially steroid hormones and bile acids. The HapMap data show that numerous SNPs upstream of UGT2B4 are in near-perfect linkage disequilibrium with each other and occur at intermediate frequency, indicating that this region might contain a target of natural selection. To investigate this possibility, we chose three regions (4.8 kb in total) for resequencing and observed a striking excess of intermediate-frequency alleles that define two major haplotypes separated by many mutation events and with little differentiation across populations, thus suggesting that the variation pattern upstream UGT2B4 is highly unusual and may be the result of balancing selection. We propose that this pattern is due to the maintenance of a regulatory polymorphism involved in the fine tuning of UGT2B4 expression so that heterozygous genotypes result in optimal enzyme levels. Considering the important role of steroid hormones in breast cancer susceptibility, we hypothesized that variation in this region could predispose to breast cancer. To test this hypothesis, we genotyped tag SNP rs13129471 in 1,261 patients and 825 normal women of African ancestry from three populations. The frequency comparison indicated that rs13129471 was significantly associated with breast cancer after adjusting for ethnicity [P = 0.003; heterozygous odds ratio (OR) 1.02, 95% confidence interval (CI) 0.81–1.28; homozygous OR 1.50, 95% CI 1.15–1.95]. Our results provide new insights into UGT2B4 sequence variation and indicate that a signal of natural selection may lead to the identification of disease susceptibility variants.
The genetic trait of lactase persistence is attributable to allelic variants in an enhancer region upstream of the lactase gene, LCT. To date, five different functional alleles, −13910*T, −13907*G, −13915*G, −14009*G and −14010*C, have been identified. The co-occurrence of several of these alleles in Ethiopian lactose digesters leads to a pattern of sequence diversity characteristic of a ‘soft selective sweep’. Here we hypothesise that throughout Africa, where multiple functional alleles co-exist, the enhancer diversity will be greater in groups who are traditional milk drinkers than in non-milk drinkers, as the result of this sort of parallel selection. Samples from 23 distinct groups from 10 different countries were examined. Each group was classified ‘Yes ‘or ‘No’ for milk-drinking, and ethnicity, language spoken and geographic location were recorded. Predicted lactase persistence frequency and enhancer diversity were, as hypothesised, higher in the milk drinkers than the non-milk-drinkers, but this was almost entirely accounted for by the Afro-Asiatic language speaking peoples of east Africa. The other groups, including the ‘Nilo-Saharan language speaking’ milk-drinkers, show lower frequencies of LP and lower diversity, and there was a north-east to south-west decline in overall diversity. Amongst the Afro-Asiatic (Cushitic) language speaking Oromo, however, the geographic cline was not evident and the southern pastoralist Borana showed much higher LP frequency and enhancer diversity than the other groups. Together these results reflect the effects of parallel selection, the stochastic processes of the occurrence and spread of the mutations, and time depth of milk drinking tradition.
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-015-1573-2) contains supplementary material, which is available to authorized users.
Eukaryotes employ combinatorial strategies to generate a variety of expression patterns from a relatively small set of regulatory DNA elements. As in any other language, deciphering the mapping between DNA and expression requires an understanding of the set of rules that govern basic principles in transcriptional regulation, the functional elements involved, and the ways in which they combine to orchestrate a transcriptional output. Here, we review current understanding of various grammatical rules, including the effect on expression of the number of transcription factor binding-sites, their location, orientation, affinity and activity; co-association with different factors; and intrinsic nucleosome organization. We review different methods that are used to study the grammar of transcription regulation, highlight gaps in current understanding, and discuss how recent technological advances may be utilized to bridge them.
Transcriptional regulation; gene expression; transcription factor; binding site; nucleosome
Prior studies have identified common genetic variants influencing diabetic and non-diabetic nephropathy, diseases which disproportionately affect African Americans. Recently, exome sequencing techniques have facilitated identification of coding variants on a genome-wide basis in large samples. Exonic variants in known or suspected end-stage kidney disease (ESKD) or nephropathy genes can be tested for their ability to identify association either singly or in combination with known associated common variants. Coding variants in genes with prior evidence for association with ESKD or nephropathy were identified in the NHLBI-ESP GO database and genotyped in 5045 African Americans (3324 cases with type 2 diabetes associated nephropathy [T2D-ESKD] or non-T2D ESKD, and 1721 controls) and 1465 European Americans (568 T2D-ESKD cases and 897 controls). Logistic regression analyses were performed to assess association, with admixture and APOL1 risk status incorporated as covariates. Ten of 31 SNPs were associated in African Americans; four replicated in European Americans. In African Americans, SNPs in OR2L8, OR2AK2, C6orf167 (MMS22L), LIMK2, APOL3, APOL2, and APOL1 were nominally associated (P=1.8×10−4-0.044). Haplotype analysis of common and coding variants increased evidence of association at the OR2L13 and APOL1 loci (P=6.2×10−5 and 4.6×10−5, respectively). SNPs replicating in European Americans were in OR2AK2, LIMK2, and APOL2 (P=0.0010-0.037). Meta-analyses highlighted four SNPs associated in T2DESKD and all-cause ESKD. Results from this study suggest a role for coding variants in the development of diabetic, non-diabetic, and/or all-cause ESKD in African Americans and/or European Americans.
African Americans; Association; European Americans; Exonic Variants; Type 2 Diabetes; Nephropathy
Trisomy 21 (Down syndrome, DS) is the most common human genetic anomaly associated with heart defects. Based on evolutionary conservation, DS-associated heart defects have been modeled in mice. By generating and analyzing mouse mutants carrying different genomic rearrangements in human chromosome 21 (Hsa21) syntenic regions, we found the triplication of the Tiam1-Kcnj6 region on mouse chromosome 16 (Mmu16) resulted in DS-related cardiovascular abnormalities. In this study, we developed two tandem duplications spanning the Tiam1-Kcnj6 genomic region on Mmu16 using recombinase-mediated genome engineering, Dp(16)3Yey and Dp(16)4Yey, spanning the 2.1Mb Tiam1-Il10rb and 3.7Mb Ifnar1-Kcnj6 regions, respectively. We found that Dp(16)4Yey/+, but not Dp(16)3Yey/+, led to heart defects, suggesting the triplication of the Ifnar1-Kcnj6 region is sufficient to cause DS-associated heart defects. Our transcriptional analysis of Dp(16)4Yey/+ embryos showed that the Hsa21 gene orthologs located within the duplicated interval were expressed at the elevated levels, reflecting the consequences of the gene dosage alterations. Therefore, we have identified a 3.7Mb genomic region, the smallest critical genomic region, for DS-associated heart defects, and our results should set the stage for the final step to establish the identities of the causal gene(s), whose elevated expression(s) directly underlie this major DS phenotype.
Down syndrome; trisomy 21; heart defects - congenital; chromosome engineering; mouse models for human genetic disease; genetic mapping
Expression quantitative trait loci (eQTLs) are currently the most abundant and systematically-surveyed class of functional consequence for genetic variation. Recent genetic studies of gene expression have identified thousands of eQTLs in diverse tissue types for the majority of human genes. Application of this large eQTL catalogue provides an important resource for understanding the molecular basis of common genetic diseases. However, only now has both the availability of individuals with full genomes and corresponding advances in functional genomics provided the opportunity to dissect eQTLs to identify causal regulatory variants. Resolving the properties of such causal regulatory variants is improving understanding of the molecular mechanisms that influence traits and guiding the development of new genome-scale approaches to variant interpretation. In this review, we provide an overview of current computational and experimental methods for identifying causal regulatory variants and predicting their phenotypic consequences.
Bone size (BS) is one of the major risk factors for osteoporotic fractures. BS variation is genetically determined to a substantial degree with heritability over 50%, but specific genes underlying variation of BS are still largely unknown. To identify specific genes for BS in Chinese, initial genome-wide association scan (GWAS) study and follow-up replication study were performed. In initial GWAS study, a group of 12 contiguous single-nucleotide polymorphism (SNP)s, which span a region of ~ 25 kb and locate at the upstream of HMGN3 gene (high-mobility group nucleosomal binding domain 3), achieved moderate association signals for spine BS, with P values ranging from 6.2E–05 to 1.8E–06. In the follow-up replication study, eight of the 12 SNPs were detected suggestive replicate associations with BS in 1,728 unrelated female Caucasians, which have well-known differences from Chinese in ethnic genetic background. The SNPs in the region of HMGN3 gene formed a tightly combined haplotype block in both Chinese and Caucasians. The results suggest that the genomic region containing HMGN3 gene may be associated with spine BS in Chinese.
Genetic causes for abdominal aortic aneurysm (AAA) have not been identified and the role of genes associated with familial thoracic aneurysms in AAA has not been explored. We analyzed nine genes associated with familial thoracic aortic aneurysms, the vascular Ehlers–Danlos gene COL3A1 and the MTHFR p.Ala222Val variant in 155 AAA patients. The thoracic aneurysm genes selected for this study were the transforming growth factor-beta pathway genes EFEMP2, FBN1, SMAD3, TGBF2, TGFBR1, TGFBR2, and the smooth muscle cells genes ACTA2, MYH11 and MYLK. Sanger sequencing of all coding exons and exon–intron boundaries of these genes was performed. Patients with at least one first-degree relative with an aortic aneurysm were classified as familial AAA (n = 99), the others as sporadic AAA. We found 47 different rare heterozygous variants in eight genes: two pathogenic, one likely pathogenic, twenty-one variants of unknown significance (VUS) and twenty-three unlikely pathogenic variants. In familial AAA we found one pathogenic and segregating variant (COL3A1 p.Arg491X), one likely pathogenic and segregating (MYH11 p.Arg254Cys), and fifteen VUS. In sporadic patients we found one pathogenic (TGFBR2 p.Ile525Phefs*18) and seven VUS. Thirteen patients had two or more variants. These results show a previously unknown association and overlapping genetic defects between AAA and familial thoracic aneurysms, indicating that genetic testing may help to identify the cause of familial and sporadic AAA. In this view, genetic testing of these genes specifically or in a genome-wide approach may help to identify the cause of familial and sporadic AAA.
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-015-1567-0) contains supplementary material, which is available to authorized users.
To dissect the genetic architecture of sexual dimorphism in obesity-related traits, we evaluated the sex–genotype interaction, sex-specific heritability and genome-wide linkages for seven measurements related to obesity. A total of 1,365 non-diabetic Chinese subjects from the family study of the Stanford Asia–Pacific Program of Hypertension and Insulin Resistance were used to search for quantitative trait loci (QTLs) responsible for the obesity-related traits. Pleiotropy and co-incidence effects from the QTLs were also examined using the bivariate linkage approach. We found that sex-specific differences in heritability and the genotype–sex interaction effects were substantially significant for most of these traits. Several QTLs with strong linkage evidence were identified after incorporating genotype by sex (G × S) interactions into the linkage mapping, including one QTL for hip circumference [maximum LOD score (MLS) = 4.22, empirical p = 0.000033] and two QTLs: for BMI on chromosome 12q with MLS 3.37 (empirical p = 0.0043) and 3.10 (empirical p = 0.0054). Sex-specific analyses demonstrated that these linkage signals all resulted from females rather than males. Most of these QTLs for obesity-related traits replicated the findings in other ethnic groups. Bivariate linkage analyses showed several obesity traits were influenced by a common set of QTLs. All regions with linkage signals were observed in one gender, but not in the whole sample, suggesting the genetic architecture of obesity-related traits does differ by gender. These findings are useful for further identification of the liability genes for these phenotypes through candidate genes or genome-wide association analysis.
Succinate dehydrogenase (SDH) is a crucial metabolic enzyme complex that is involved in ATP production, playing roles in both the tricarboxylic cycle and the mitochondrial respiratory chain (complex II). Isolated complex II deficiency is one of the rarest oxidative phosphorylation disorders with mutations described in three structural subunits and one of the assembly factors; just one case is attributed to recessively inherited SDHD mutations. We report the pathological, biochemical, histochemical and molecular genetic investigations of a male neonate who had left ventricular hypertrophy detected on antenatal scan and died on day one of life. Subsequent postmortem examination confirmed hypertrophic cardiomyopathy with left ventricular non-compaction. Biochemical analysis of his skeletal muscle biopsy revealed evidence of a severe isolated complex II deficiency and candidate gene sequencing revealed a novel homozygous c.275A>G, p.(Asp92Gly) SDHD mutation which was shown to be recessively inherited through segregation studies. The affected amino acid has been reported as a Dutch founder mutation p.(Asp92Tyr) in families with hereditary head and neck paraganglioma. By introducing both mutations into Saccharomyces cerevisiae, we were able to confirm that the p.(Asp92Gly) mutation causes a more severe oxidative growth phenotype than the p.(Asp92Tyr) mutant, and provides functional evidence to support the pathogenicity of the patient’s SDHD mutation. This is only the second case of mitochondrial complex II deficiency due to inherited SDHD mutations and highlights the importance of sequencing all SDH genes in patients with biochemical and histochemical evidence of isolated mitochondrial complex II deficiency.
Obsessive compulsive disorder (OCD) has a complex etiology that encompasses both genetic and environmental factors. However, to date, despite the identification of several promising candidate genes and linkage regions, the genetic causes of OCD are largely unknown. The objective of this study was to conduct linkage studies of childhood-onset OCD, which is thought to have the strongest genetic etiology, in several OCD-affected families from the genetically isolated population of the Central Valley of Costa Rica (CVCR). The authors used parametric and non-parametric approaches to conduct genome-wide linkage analyses using 5786 single nucleotide repeat polymorphisms (SNPs) in three CVCR families with multiple childhood-onset OCD-affected individuals. We identified areas of suggestive linkage (LOD score ≥2) on chromosomes 1p21, 15q14, 16q24, and 17p12. The strongest evidence for linkage was on chromosome 15q14 (LOD=3.13), identified using parametric linkage analysis with a recessive model, and overlapping a region identified in a prior linkage study using a Caucasian population. Each CVCR family had a haplotype that co-segregated with OCD across a ~7Mbp interval within this region, which contains 18 identified brain expressed genes, several of which are potentially relevant to OCD. Exonic sequencing of the strongest candidate gene in this region, the ryanodine receptor 3 (RYR3), identified several genetic variants of potential interest, although none cosegregated with OCD in all three families. These findings provide evidence that chromosome 15q14 is linked to OCD in families from the CVCR, and supports previous findings to suggest that this region may contain one or more OCD susceptibility loci.
Obsessive Compulsive Disorder; Genetic Linkage; Genetic Isolate; Genetic Loci; Humans; Genetic Predisposition to Disease
DNA damage in somatic cells originates from both environmental and endogenous sources, giving rise to mutations through multiple mechanisms. When these mutations affect the function of critical genes, cancer may ensue. Although identifying genomic subsets of mutated genes may inform therapeutic options, a systematic survey of tumor mutational spectra is required to improve our understanding of the underlying mechanisms of mutagenesis involved in cancer etiology. Recent studies have presented genome-wide sets of somatic mutations as a 96-element vector, a procedure that only captures the immediate neighbors of the mutated nucleotide. Herein, we present a 32 × 12 mutation matrix that captures the nucleotide pattern two nucleotides upstream and downstream of the mutation. A somatic autosomal mutation matrix (SAMM) was constructed from tumor-specific mutations derived from each of 909 individual cancer genomes harboring a total of 10,681,843 single-base substitutions. In addition, mechanistic template mutation matrices (MTMMs) representing oxidative DNA damage, ultraviolet-induced DNA damage, 5mCpG deamination, and APOBEC-mediated cytosine mutation, are presented. MTMMs were mapped to the individual tumor SAMMs to determine the maximum contribution of each mutational mechanism to the overall mutation pattern. A Manhattan distance across all SAMM elements between any two tumor genomes was used to determine their relative distance. Employing this metric, 89.5 % of all tumor genomes were found to have a nearest neighbor from the same tissue of origin. When a distance-dependent 6-nearest neighbor classifier was used, 86.9 % of all SAMMs were assigned to the correct tissue of origin. Thus, although tumors from different tissues may have similar mutation patterns, their SAMMs often display signatures that are characteristic of specific tissues.
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-015-1566-1) contains supplementary material, which is available to authorized users.
Telomeres, the repetitive sequences that protect the ends of chromosomes, help to maintain genomic integrity and are of key importance to human health. The aim here is to give an overview of the evidence for the importance of telomere length (TL) to the risk of common disease, considering the strengths and weaknesses of different epidemiological study designs. Methods for measuring TL are described, all of which are subject to considerable measurement error. TL declines with age and varies in relation to factors such as smoking and obesity. It is also highly heritable (estimated heritability of ~40 to 50 %), and genome-wide studies have identified a number of associated genetic variants. Epidemiological studies have shown shorter TL to be associated with risk of a number of common diseases, including cardiovascular disease and some cancers. The relationship with cancer appears complex, in that longer telomeres are associated with higher risk of some cancers. Prospective studies of the relationship between TL and disease, where TL is measured before diagnosis, have numerous advantages over retrospective studies, since they avoid the problems of reverse causality and differences in sample handling, but they are still subject to potential confounding. Studies of the genetic predictors of TL in relation to disease risk avoid these drawbacks, although they are not without limitations. Telomere biology is of major importance to the risk of common disease, but the complexities of the relationship are only now beginning to be understood.
In the International Visible Trait Genetics (VisiGen) Consortium, we investigated the genetics of human skin color by combining a series of genome-wide association studies (GWAS) in a total of 17,262 Europeans with functional follow-up of discovered loci. Our GWAS provide the first genome-wide significant evidence for chromosome 20q11.22 harboring the ASIP gene being explicitly associated with skin color in Europeans. In addition, genomic loci at 5p13.2 (SLC45A2), 6p25.3 (IRF4), 15q13.1 (HERC2/OCA2), and 16q24.3 (MC1R) were confirmed to be involved in skin coloration in Europeans. In follow-up gene expression and regulation studies of 22 genes in 20q11.22, we highlighted two novel genes EIF2S2 and GSS, serving as competing functional candidates in this region and providing future research lines. A genetically inferred skin color score obtained from the 9 top-associated SNPs from 9 genes in 940 worldwide samples (HGDP-CEPH) showed a clear gradual pattern in Western Eurasians similar to the distribution of physical skin color, suggesting the used 9 SNPs as suitable markers for DNA prediction of skin color in Europeans and neighboring populations, relevant in future forensic and anthropological
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-015-1559-0) contains supplementary material, which is available to authorized users.
We have assessed copy number variation (CNV) in the male-specific part of the human Y chromosome discovered by array comparative genomic hybridization (array-CGH) in 411 apparently healthy UK males, and validated the findings using SNP genotype intensity data available for 149 of them. After manual curation taking account of the complex duplicated structure of Y-chromosomal sequences, we discovered 22 curated CNV events considered validated or likely, mean 0.93 (range 0–4) per individual. 16 of these were novel. Curated CNV events ranged in size from <1 kb to >3 Mb, and in frequency from 1/411 to 107/411. Of the 24 protein-coding genes or gene families tested, nine showed CNV. These included a large duplication encompassing the AMELY and TBL1Y genes that probably has no phenotypic effect, partial deletions of the TSPY cluster and AZFc region that may influence spermatogenesis, and other variants with unknown functional implications, including abundant variation in the number of RBMY genes and/or pseudogenes, and a novel complex duplication of two segments overlapping the AZFa region and including the 3′ end of the UTY gene.
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-015-1562-5) contains supplementary material, which is available to authorized users.
Epistasis, or gene–gene interaction, results from joint effects of genes on a trait; thus, the same alleles of one gene may display different genetic effects in different genetic backgrounds. In this study, we generalized the coding technique of a natural and orthogonal interaction (NOIA) model for association studies along with gene–gene interactions for dichotomous traits and human complex diseases. The NOIA model which has non-correlated estimators for genetic effects is important for estimating influencing from multiple loci. We conducted simulations and data analyses to evaluate the performance of the NOIA model. Both simulation and real data analyses revealed that the NOIA statistical model had higher power for detecting main genetic effects and usually had higher power for some interaction effects than the usual model. Although associated genes have been identified for predisposing people to melanoma risk: HERC2 at 15q13.1, MC1R at 16q24.3 and CDKN2A at 9p21.3, no gene–gene interaction study has been fully explored for melanoma. By applying the NOIA statistical model to a genome-wide melanoma dataset, we confirmed the previously identified significantly associated genes and found potential regions at chromosomes 5 and 4 that may interact with the HERC2 and MC1R genes, respectively. Our study not only generalized the orthogonal NOIA model but also provided useful insights for understanding the influencing of interactions on melanoma risk.
Alcohol dependence (AD) is a complex psychiatric disorder that affects about 12.5% of US adults. Genetic factors play a major role in the development of AD.
We conducted a genome-wide association study in 2875 African Americans (AA) including 1719 AD cases and 1156 controls. We used the Illumina Omni 1-Quad microarray, which yielded 769,498 single-nucleotide polymorphisms (SNPs) after quality control. To explore the genetic architecture of AD, we estimated the variance that could be explained by all SNPs and subsets of SNPs using two different approaches to genome partitioning.
We found that 23.9% (s.e. 9.3%) of the phenotypic variance could be explained by using all of the common SNPs on the array. We also found a significant linear relationship between the proportion of the top SNPs used and the phenotypic variance explained by them. Based on genome partitioning of common variants, we also observed a significant linear relationship between the variance explained by a chromosome and its length. Chromosome 4, known to contain several AD risk genes, accounted for excess risk in proportion to its length. By functional partitioning, we found that the genetic variants within 20 kb of genes explained 17.5% (s.e. 11.4%) of the phenotypic variance. Our findings are consistent with the generally accepted view that AD is a highly polygenic trait, i.e., the genetic risk in AD appears to be conferred by multiple variants, each of which may have a small or moderate effect.
Genomewide association study; alcohol dependence; heritability
The direct physiological effects that promote nicotine dependence (ND) are mediated by nicotinic acetylcholine receptors (nAChRs). In line with the genetic and pharmacological basis of addiction, many previous studies have revealed significant associations between variants in the nAChR subunit genes and various measures of ND in different ethnic samples. In this study, we first examined the association of variants in nAChR subunits α2 (CHRNA2) and α6 (CHRNA6) genes on chromosome 8 with ND using a family sample consisting of 1,730 European Americans (EAs) from 495 families and 1,892 African Americans (AAs) from 424 families (defined as the discovery family sample). ND was assessed by two standard quantitative measures: Smoking Quantity (SQ) and the Fagerström Test for ND (FTND). We found nominal associations for all seven tested SNPs of the genes with at least one ND measure in the EA sample and for two SNPs in CHRNA2 in the AA sample. Of these, associations of SNPs rs3735757 with FTND (P = 0.0068) and rs2472553 with both ND measures (with a P value of 0.0043 and 0.00086 for SQ and FTND, respectively) continued to be significant in the EA sample even after correction for multiple tests. Further, we found several haplotypes that were significantly associated with ND in the EA sample in CHRNA6 and in the both EA and AA samples in CHRNA2. To confirm the associations of the two genes with ND, we conducted a replication study with an independent case-control sample from the SAGE study, which showed a significant association of the two genes with ND, although the significantly associated SNPs were not always the same in the two samples. Together, these findings indicate that both CHRNA2 and CHRNA6 play a significant role in the etiology of ND in AA and EA smokers. Further replication in additional independent samples is warranted.
CHRNA2; CHRNA6; smoking; tobacco dependence association; meta-analysis
Age-adjusted mortality rates for prostate cancer are higher for
African American men compared with those of European ancestry. Recent data
suggest that West African men also have elevated risk for prostate cancer
relative to European men. Genetic susceptibility to prostate cancer could
account for part of this difference.
We conducted a genome-wide association study (GWAS) of prostate
cancer in West African men in the Ghana Prostate Study. Association testing
was performed using multivariable logistic regression adjusted for age and
genetic ancestry for 474 prostate cancer cases and 458 population-based
controls on the Illumina HumanOmni-5 Quad BeadChip.
The most promising association was at 10p14 within an intron of a
long non-coding RNA (lncRNA RP11-543F8.2) 360 kb centromeric of
GATA3 (p=1.29E−7). In sub-analyses, SNPs at
5q31.3 were associated with high Gleason score (≥7) cancers, the
strongest of which was a missense SNP in PCDHA1
(rs34575154, p=3.66E−8), and SNPs at Xq28 (rs985081,
p=8.66E−9) and 6q21 (rs2185710, p=5.95E−8) were associated
with low Gleason score (<7) cancers. We sought to validate our
findings in silico in the African Ancestry Prostate Cancer
GWAS Consortium, but only one SNP, at 10p14, replicated at p<0.05.
Of the 90 prostate cancer loci reported from studies of men of European,
Asian or African American ancestry, we were able to test 81 in the Ghana
Prostate Study, and 10 of these replicated at p<0.05.
Further genetic studies of prostate cancer in West African men are
needed to confirm our promising susceptibility loci.
prostate cancer; Africa; GWAS; case-control