Despite intensive investigation the mechanism by which HIV-1 reaches the host cell nucleus is unknown. TNPO3, a karyopherin mediating nuclear entry of SR-proteins, was shown to be required for HIV-1 infectivity. Some investigators have reported that TNPO3 promotes HIV-1 nuclear import, as would be expected for a karyopherin. Yet, an equal number of investigators have failed to obtain evidence that supports this model. Here, a series of experiments were performed to better elucidate the mechanism by which TNPO3 promotes HIV-1 infectivity.
To examine the role of TNPO3 in HIV-1 replication, the 2-LTR circles that are commonly used as a marker for HIV-1 nuclear entry were cloned after infection of TNPO3 knockdown cells. Potential explanation for the discrepancy in the literature concerning the effect of TNPO3 was provided by sequencing hundreds of these clones: a significant fraction resulted from autointegration into sites near the LTRs and therefore were not bona fide 2-LTR circles. In response to this finding, new techniques were developed to monitor HIV-1 cDNA, including qPCR reactions that distinguish 2-LTR circles from autointegrants, as well as massive parallel sequencing of HIV-1 cDNA. With these assays, TNPO3 knockdown was found to reduce the levels of 2-LTR circles. This finding was puzzling, though, since previous work has shown that the HIV-1 determinant for TNPO3-dependence is capsid (CA), an HIV-1 protein that forms a mega-dalton protein lattice in the cytoplasm. TNPO3 imports cellular splicing factors via their SR-domain. Attention was therefore directed towards CPSF6, an SR-protein that binds HIV-1 CA and inhibits HIV-1 nuclear import when the C-terminal SR-domain is deleted. The effect of 27 HIV-1 capsid mutants on sensitivity to TNPO3 knockdown was then found to correlate strongly with sensitivity to inhibition by a C-terminal deletion mutant of CPSF6 (R2 = 0.883, p < 0.0001). TNPO3 knockdown was then shown to cause CPSF6 to accumulate in the cytoplasm. Mislocalization of CPSF6 to the cytoplasm, whether by TNPO3 knockdown, deletion of the CPSF6 nuclear localization signal, or by fusion of CPSF6 to a nuclear export signal, resulted in inhibition of HIV-1 replication. Additionally, targeting CPSF6 to the nucleus by fusion to a heterologous nuclear localization signal rescued HIV-1 from the inhibitory effects of TNPO3 knockdown. Finally, mislocalization of CPSF6 to the cytoplasm was associated with abnormal stabilization of the HIV-1 CA core.
TNPO3 promotes HIV-1 infectivity indirectly, by shifting the CA-binding protein CPSF6 to the nucleus, thus preventing the excessive HIV-1 CA stability that would otherwise result from cytoplasmic accumulation of CPSF6.
HIV-1; TNPO3; CPSF6; Capsid; Nuclear transport
Down syndrome (DS) is mainly caused by the presence of an extra copy of human chromosome 21 (Hsa21) and is a leading genetic cause for developmental cognitive disabilities in humans. The mouse is a premier model organism for DS because the regions on Hsa21 are syntenically conserved with three regions in the mouse genome, which are located on mouse chromosome 10 (Mmu10), Mmu16 and Mmu17. With the advance of chromosomal manipulation technologies, new mouse mutants have been generated to mimic DS at both the genotypic and phenotypic levels. Further mouse-based molecular genetic studies in the future may lead to the unraveling of the mechanisms underlying DS-associated developmental cognitive disabilities, which would lay the groundwork for developing effective treatments for this phenotypic manifestation. In this review, we will discuss recent progress and future challenges in modeling DS-associated developmental cognitive disability in mice with an emphasis on hippocampus-related phenotypes.
Down syndrome; Human trisomy 21; Developmental cognitive disabilities; Mouse models; Targeted chromosome manipulation
Natural variation in DNA sequence contributes to individual differences in quantitative traits. While multiple studies have shown genetic control over gene expression variation, few additional cellular traits have been investigated. Here, we investigated the natural variation of NADPH oxidase-dependent hydrogen peroxide (H2O2 release), which is the joint effect of reactive oxygen species (ROS) production, superoxide metabolism and degradation, and is related to a number of human disorders. We assessed the normal variation of H2O2 release in lymphoblastoid cell lines (LCL) in a family-based 3-generation cohort (CEPH-HapMap), and in 3 population-based cohorts (KORA, GenCord, HapMap). Substantial individual variation was observed, 45% of which were associated with heritability in the CEPH-HapMap cohort. We identified 2 genome-wide significant loci of Hsa12 and Hsa15 in genome-wide linkage analysis. Next, we performed genome-wide association study (GWAS) for the combined KORA-GenCord cohorts (n = 279) using enhanced marker resolution by imputation (>1.4 million SNPs). We found 5 significant associations (p<5.00×10−8) and 54 suggestive associations (p<1.00×10−5), one of which confirmed the linked region on Hsa15. To replicate our findings, we performed GWAS using 58 HapMap individuals and ∼2.1 million SNPs. We identified 40 genome-wide significant and 302 suggestive SNPs, and confirmed genome signals on Hsa1, Hsa12, and Hsa15. Genetic loci within 900 kb from the known candidate gene p67phox on Hsa1 were identified in GWAS in both cohorts. We did not find replication of SNPs across all cohorts, but replication within the same genomic region. Finally, a highly significant decrease in H2O2 release was observed in Down Syndrome (DS) individuals (p<2.88×10−12). Taken together, our results show strong evidence of genetic control of H2O2 in LCL of healthy and DS cohorts and suggest that cellular phenotypes, which themselves are also complex, may be used as proxies for dissection of complex disorders.
DNA sequencing has become cheap, rapid and accurate, allowing us to access thousands of genomes and reveal the extensive variation among individuals. The major problem that arises from this is distinguishing between neutral and pathogenic variants. A recent study by Davis et al., in which a functional screen of all the non-synonymous variants of a newly discovered gene was performed, highlights the value and necessity of characterizing the functional consequences of each genomic variant discovered. This is the main challenge for the advancement of genomic medicine in the years to come.
Myopia affects more people worldwide than any other chronic condition, and it is increasing in all populations across the globe. It affects ∼25% of the U.S. general population between the ages of 12 and 54 years. The present study is a genetic investigation of X-linked high-grade myopia that maps to Xq28.
Myopia is a common vision problem affecting almost one third of the world's population. It can occur as an isolated genetic condition or be associated with other anomalies and/or syndromes. Seventeen myopia loci have been identified on various chromosomes; however, no specific gene mutations have yet been identified.
Two large multigeneration Asian Indian pedigrees (UR006 and UR077) with isolated, nonsyndromic myopia were studied, in which the condition appeared to segregate as an X-linked recessive trait (MYP1; MIM 310460). The degree of myopia was variable in both families, ranging from −6 to −23 D (mean, –8.48 D) with the majority >7.0 D. To map the myopia locus in these families, polymorphic microsatellite markers covering the entire X chromosome were used in linkage analyses performed on 42 genomic DNA samples (13 affected and 29 normal) from both families.
Marker DXYS154, which is located within the pseudoautosomal region in distal Xq28 (PAR2; pseudoautosomal region 2), gave a combined maximum LOD score of 5.3 at θ = 0 under an autosomal recessive model. Other markers in the region (near but not within the PAR2 region) that showed no recombination with the phenotype in both the families included DXS1108, DXS8087, and F8i13.
Observation of recombination in family UR006 refined the disease locus to a ∼1.25-Mb region flanked by the proximal marker DXS1073 and distal marker DXYS154. Mutation search in exons and splice junctions of candidate genes CTAG2, GAB3, MPP1, F8Bver, FUNDC2, VBP1, RAB39B, CLIC2, TMLHE, SYBL, IL9R, SPRY3, and CXYorf1 did not detect a pathogenic or predisposing variant.
The classic organization of a gene structure has followed the Jacob and Monod bacterial gene model proposed more than 50 years ago. Since then, empirical determinations of the complexity of the transcriptomes found in yeast to human has blurred the definition and physical boundaries of genes. Using multiple analysis approaches we have characterized individual gene boundaries mapping on human chromosomes 21 and 22. Analyses of the locations of the 5′ and 3′ transcriptional termini of 492 protein coding genes revealed that for 85% of these genes the boundaries extend beyond the current annotated termini, most often connecting with exons of transcripts from other well annotated genes. The biological and evolutionary importance of these chimeric transcripts is underscored by (1) the non-random interconnections of genes involved, (2) the greater phylogenetic depth of the genes involved in many chimeric interactions, (3) the coordination of the expression of connected genes and (4) the close in vivo and three dimensional proximity of the genomic regions being transcribed and contributing to parts of the chimeric RNAs. The non-random nature of the connection of the genes involved suggest that chimeric transcripts should not be studied in isolation, but together, as an RNA network.
We propose an innovative, integrated, cost-effective health system to combat major non-communicable diseases (NCDs), including cardiovascular, chronic respiratory, metabolic, rheumatologic and neurologic disorders and cancers, which together are the predominant health problem of the 21st century. This proposed holistic strategy involves comprehensive patient-centered integrated care and multi-scale, multi-modal and multi-level systems approaches to tackle NCDs as a common group of diseases. Rather than studying each disease individually, it will take into account their intertwined gene-environment, socio-economic interactions and co-morbidities that lead to individual-specific complex phenotypes. It will implement a road map for predictive, preventive, personalized and participatory (P4) medicine based on a robust and extensive knowledge management infrastructure that contains individual patient information. It will be supported by strategic partnerships involving all stakeholders, including general practitioners associated with patient-centered care. This systems medicine strategy, which will take a holistic approach to disease, is designed to allow the results to be used globally, taking into account the needs and specificities of local economies and health systems.
Cancer genomes frequently contain somatic copy number alterations (SCNA) that can significantly perturb the expression level of affected genes and thus disrupt pathways controlling normal growth. In melanoma, many studies have focussed on the copy number and gene expression levels of the BRAF, PTEN and MITF genes, but little has been done to identify new genes using these parameters at the genome-wide scale. Using karyotyping, SNP and CGH arrays, and RNA-seq, we have identified SCNA affecting gene expression (‘SCNA-genes’) in seven human metastatic melanoma cell lines. We showed that the combination of these techniques is useful to identify candidate genes potentially involved in tumorigenesis. Since few of these alterations were recurrent across our samples, we used a protein network-guided approach to determine whether any pathways were enriched in SCNA-genes in one or more samples. From this unbiased genome-wide analysis, we identified 28 significantly enriched pathway modules. Comparison with two large, independent melanoma SCNA datasets showed less than 10% overlap at the individual gene level, but network-guided analysis revealed 66% shared pathways, including all but three of the pathways identified in our data. Frequently altered pathways included WNT, cadherin signalling, angiogenesis and melanogenesis. Additionally, our results emphasize the potential of the EPHA3 and FRS2 gene products, involved in angiogenesis and migration, as possible therapeutic targets in melanoma. Our study demonstrates the utility of network-guided approaches, for both large and small datasets, to identify pathways recurrently perturbed in cancer.
Short (< 200 nt) RNA (sRNA) profiling of human cells using various technologies demonstrates unexpected complexity of sRNAs with 100’s of thousands of sRNA species present 1,2,3,4. Genetic and in vitro studies argue that these RNAs are not merely degradation products of longer transcripts but could indeed have function1,2,5. Furthermore, profiling of RNAs, including the sRNAs, can reveal not only novel transcripts, but also make clear predictions about the existence and properties of novel biochemical pathways operating in a cell. For example, short RNA profiling in human cells suggested existence of an unknown capping mechanism operating on cleaved RNA 2 a biochemical component of which was later identified6. Here we show that human cells contain a novel type of sRNAs that have non-genomically encoded 5’ polyU tails. Presence of these RNAs at the termini of genes, specifically at the very 3’ ends of known mRNAs strongly argues for the presence of a yet uncharacterized endogenous biochemical pathway in a cell that can copy RNA. We show that this pathway can operate on multiple genes, with specific enrichment towards transcripts encoding components of the translational machinery. Finally we show that genes are also flanked by sense, 3’ polyadenylated sRNAs that are likely to be capped.
Comparative analyses of various mammalian genomes have identified numerous conserved non-coding (CNC) DNA elements that display striking conservation among species, suggesting that they have maintained specific functions throughout evolution. CNC function remains poorly understood, although recent studies have identified a role in gene regulation. We hypothesized that the identification of genomic loci that interact physically with CNCs would provide information on their functions. We have used circular chromosome conformation capture (4C) to characterize interactions of 10 CNCs from human chromosome 21 in K562 cells. The data provide evidence that CNCs are capable of interacting with loci that are enriched for CNCs. The number of trans interactions varies among CNCs; some show interactions with many loci, while others interact with few. Some of the tested CNCs are capable of driving the expression of a reporter gene in the mouse embryo, and associate with the oligodendrocyte genes OLIG1 and OLIG2. Our results underscore the power of chromosome conformation capture for the identification of targets of functional DNA elements and raise the possibility that CNCs exert their functions by physical association with defined genomic regions enriched in CNCs. These CNC-CNC interactions may in part explain their stringent conservation as a group of regulatory sequences.
Finding sequences that control expression of genes is central to understanding genome function. Previous studies have used evolutionary conservation as an indicator of regulatory potential. Here, we present a method for the unbiased in vivo screen of putative enhancers in large DNA regions, using the mouse as a model. We cloned a library of 142 overlapping fragments from a 200 kb-long murine BAC in a lentiviral vector expressing LacZ from a minimal promoter, and used the resulting vectors to infect fertilized murine oocytes. LacZ staining of E11 embryos obtained by first using the vectors in pools and then testing individual candidates led to the identification of 3 enhancers, only one of which shows significant evolutionary conservation. In situ hybridization and 3C/4C experiments suggest that this enhancer, which is active in the neural tube and posterior diencephalon, influences the expression of the Olig1 and/or Olig2 genes. This work provides a new approach for the large-scale in vivo screening of transcriptional regulatory sequences, and further demonstrates that evolutionary conservation alone seems too limiting a criterion for the identification of enhancers.
Dosage imbalance is responsible for several genetic diseases, among which Down syndrome is caused by the trisomy of human chromosome 21.
To elucidate the extent to which the dosage imbalance of specific human chromosome 21 genes perturb distinct molecular pathways, we developed the first mouse embryonic stem (ES) cell bank of human chromosome 21 genes. The human chromosome 21-mouse ES cell bank includes, in triplicate clones, 32 human chromosome 21 genes, which can be overexpressed in an inducible manner. Each clone was transcriptionally profiled in inducing versus non-inducing conditions. Analysis of the transcriptional response yielded results that were consistent with the perturbed gene's known function. Comparison between mouse ES cells containing the whole human chromosome 21 (trisomic mouse ES cells) and mouse ES cells overexpressing single human chromosome 21 genes allowed us to evaluate the contribution of single genes to the trisomic mouse ES cell transcriptome. In addition, for the clones overexpressing the Runx1 gene, we compared the transcriptome changes with the corresponding protein changes by mass spectroscopy analysis.
We determined that only a subset of genes produces a strong transcriptional response when overexpressed in mouse ES cells and that this effect can be predicted taking into account the basal gene expression level and the protein secondary structure. We showed that the human chromosome 21-mouse ES cell bank is an important resource, which may be instrumental towards a better understanding of Down syndrome and other human aneuploidy disorders.
Studies correlating genetic variation to gene expression facilitate the interpretation of common human phenotypes and disease. As functional variants may be operating in a tissue-dependent manner, we performed gene expression profiling and association with genetic variants (SNPs) on three cell types of 75 individuals. We detected cell type-specific genetic effects, with 69 - 80% of regulatory variants operating in a cell type-specific manner and identified multiple eQTLs per gene, unique or shared among cell types and positively correlated with the number of transcripts per gene. Cell type specific eQTLs were found at larger distances from genes and lower effect size similar to known enhancers. These data suggest that the complete regulatory variant repertoire can only be uncovered in the context of cell type specificity.
The first Swiss human embryonic stem cell (hESC) line, CH-ES1, has shown features of a malignant cell line. It originated from the only single blastomere that survived cryopreservation of an embryo, and it more closely resembles teratocarcinoma lines than other hESC lines with respect to its abnormal karyotype and its formation of invasive tumors when injected into SCID mice. The aim of this study was to characterize the molecular basis of the oncogenicity of CH-ES1 cells, we looked for abnormal chromosomal copy number (by array Comparative Genomic Hybridization, aCGH) and single nucleotide polymorphisms (SNPs). To see how unique these changes were, we compared these results to data collected from the 2102Ep teratocarcinoma line and four hESC lines (H1, HS293, HS401 and SIVF-02) which displayed normal G-banding result. We identified genomic gains and losses in CH-ES1, including gains in areas containing several oncogenes. These features are similar to those observed in teratocarcinomas, and this explains the high malignancy. The CH-ES1 line was trisomic for chromosomes 1, 9, 12, 17, 19, 20 and X. Also the karyotypically (based on G-banding) normal hESC lines were also found to have several genomic changes that involved genes with known roles in cancer. The largest changes were found in the H1 line at passage number 56, when large 5 Mb duplications in chromosomes 1q32.2 and 22q12.2 were detected, but the losses and gains were seen already at passage 22. These changes found in the other lines highlight the importance of assessing the acquisition of genetic changes by hESCs before their use in regenerative medicine applications. They also point to the possibility that the acquisition of genetic changes by ESCs in culture may be used to explore certain aspects of the mechanisms regulating oncogenesis.
A review of the main computational pipelines used to generate the human reference protein-coding gene sets.
The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets.
To extend the understanding of host genetic determinants of HIV-1 control, we performed a genome-wide association study in a cohort of 2,554 infected Caucasian subjects. The study was powered to detect common genetic variants explaining down to 1.3% of the variability in viral load at set point. We provide overwhelming confirmation of three associations previously reported in a genome-wide study and show further independent effects of both common and rare variants in the Major Histocompatibility Complex region (MHC). We also examined the polymorphisms reported in previous candidate gene studies and fail to support a role for any variant outside of the MHC or the chemokine receptor cluster on chromosome 3. In addition, we evaluated functional variants, copy-number polymorphisms, epistatic interactions, and biological pathways. This study thus represents a comprehensive assessment of common human genetic variation in HIV-1 control in Caucasians.
The ability to spontaneously control HIV-1 upon infection is highly variable between individuals. To evaluate the contribution of variation in human genes to differences in plasma viral load and in disease progression rates, we performed a genome-wide association study in >2,500 HIV–infected individuals. This study achieved two goals: it completed the analysis of common variation influencing viral control, and it re-assessed the majority of previously reported genetic associations. We show that genetic variants located near the HLA-B and HLA-C genes are the strongest determinants of viral control, and that other independent associations exist in the same region of chromosome 6, the Major Histocompatibility Complex, known to contain a large number of genes involved in immune defense. We could not replicate most of the previously published associations with HIV candidate genes in this large, well-characterized cohort. Overall, common human genetic variation, together with demographic variables, explains up to 22% of the variability in viral load in the Caucasian population.
Mental retardation in Down syndrome (DS), the most frequent trisomy in humans, varies from moderate to severe. Several studies both in human and based on mouse models identified some regions of human chromosome 21 (Hsa21) as linked to cognitive deficits. However, other intervals such as the telomeric region of Hsa21 may contribute to the DS phenotype but their role has not yet been investigated in detail. Here we show that the trisomy of the 12 genes, found in the 0.59 Mb (Abcg1–U2af1) Hsa21 sub-telomeric region, in mice (Ts1Yah) produced defects in novel object recognition, open-field and Y-maze tests, similar to other DS models, but induces an improvement of the hippocampal-dependent spatial memory in the Morris water maze along with enhanced and longer lasting long-term potentiation in vivo in the hippocampus. Overall, we demonstrate the contribution of the Abcg1–U2af1 genetic region to cognitive defect in working and short-term recognition memory in DS models. Increase in copy number of the Abcg1–U2af1 interval leads to an unexpected gain of cognitive function in spatial learning. Expression analysis pinpoints several genes, such as Ndufv3, Wdr4, Pknox1 and Cbs, as candidates whose overexpression in the hippocampus might facilitate learning and memory in Ts1Yah mice. Our work unravels the complexity of combinatorial genetic code modulating different aspect of mental retardation in DS patients. It establishes definitely the contribution of the Abcg1–U2af1 orthologous region to the DS etiology and suggests new modulatory pathways for learning and memory.
Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb) and 7 (1.1 Mb) from an individual from the International HapMap Project (NA12872). We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage≥4-fold, and 97.9% concordant in regions with coverage≥15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants.
RACE (Rapid Amplification of cDNA Ends) is a widely used approach for transcript identification. Random clone selection from the RACE mixture, however, is an ineffective sampling strategy if the dynamic range of transcript abundances is large. Here, we describe a strategy that uses array hybridization to improve sampling efficiency of human transcripts. The products of the RACE reaction are hybridized onto tiling arrays, and the exons detected are used to delineate a series of RT-PCR reactions, through which the original RACE mixture is segregated into simpler RT-PCR reactions. These are independently cloned, and randomly selected clones are sequenced. This approach is superior to direct cloning and sequencing of RACE products: it specifically targets novel transcripts, and often results in overall normalization of transcript abundances. We show theoretically and experimentally that this strategy leads indeed to efficient sampling of novel transcripts, and we investigate multiplexing it by pooling RACE reactions from multiple interrogated loci prior to hybridization.
To date, the contribution of disrupted potentially cis-regulatory conserved non-coding sequences (CNCs) to human disease is most likely underestimated, as no systematic screens for putative deleterious variations in CNCs have been conducted. As a model for monogenic disease we studied the involvement of genetic changes of CNCs in the cis-regulatory domain of FOXL2 in blepharophimosis syndrome (BPES). Fifty-seven molecularly unsolved BPES patients underwent high-resolution copy number screening and targeted sequencing of CNCs. Apart from three larger distant deletions, a de novo deletion as small as 7.4 kb was found at 283 kb 5′ to FOXL2. The deletion appeared to be triggered by an H-DNA-induced double-stranded break (DSB). In addition, it disrupts a novel long non-coding RNA (ncRNA) PISRT1 and 8 CNCs. The regulatory potential of the deleted CNCs was substantiated by in vitro luciferase assays. Interestingly, Chromosome Conformation Capture (3C) of a 625 kb region surrounding FOXL2 in expressing cellular systems revealed physical interactions of three upstream fragments and the FOXL2 core promoter. Importantly, one of these contains the 7.4 kb deleted fragment. Overall, this study revealed the smallest distant deletion causing monogenic disease and impacts upon the concept of mutation screening in human disease and developmental disorders in particular.
Long-range genetic control is an inherent feature of genes harbouring a highly complex spatiotemporal expression pattern, requiring a combined action of multiple cis-regulatory elements such as promoters, enhancers, and silencers. Consequently, disruption of the long-range genetic control of a target gene by genomic rearrangements of regulatory elements may lead to aberrant gene transcription and disease. To date, the contribution of mutated regulatory elements to human disease has not been studied frequently. Here, we explored the contribution of genetic changes in potentially cis-regulatory elements of the FOXL2 gene in blepharophimosis syndrome (BPES), a developmental monogenic condition of the eyelids and ovaries. We identified a de novo very subtle deletion of 7.4 kb causing BPES. Moreover, we studied the functional capacities and chromosome conformation of the deleted region in FOXL2 expressing cellular systems. Interestingly, the chromosome conformation analysis demonstrated the close proximity of the 7.4 kb deleted fragment and two other conserved regions with the FOXL2 core promoter, and the necessity of their integrity for correct FOXL2 expression. Finally, our study revealed the smallest distant deletion causing monogenic disease and emphasized the importance of mutation screening of cis-regulatory elements in human genetic disease.
Using principal component (PC) analysis, we studied the genetic constitution of 3,112 individuals from Europe as portrayed by more than 270,000 single nucleotide polymorphisms (SNPs) genotyped with the Illumina Infinium platform. In cohorts where the sample size was >100, one hundred randomly chosen samples were used for analysis to minimize the sample size effect, resulting in a total of 1,564 samples. This analysis revealed that the genetic structure of the European population correlates closely with geography. The first two PCs highlight the genetic diversity corresponding to the northwest to southeast gradient and position the populations according to their approximate geographic origin. The resulting genetic map forms a triangular structure with a) Finland, b) the Baltic region, Poland and Western Russia, and c) Italy as its vertexes, and with d) Central- and Western Europe in its centre. Inter- and intra- population genetic differences were quantified by the inflation factor lambda (λ) (ranging from 1.00 to 4.21), fixation index (Fst) (ranging from 0.000 to 0.023), and by the number of markers exhibiting significant allele frequency differences in pair-wise population comparisons. The estimated lambda was used to assess the real diminishing impact to association statistics when two distinct populations are merged directly in an analysis. When the PC analysis was confined to the 1,019 Estonian individuals (0.1% of the Estonian population), a fine structure emerged that correlated with the geography of individual counties. With at least two cohorts available from several countries, genetic substructures were investigated in Czech, Finnish, German, Estonian and Italian populations. Together with previously published data, our results allow the creation of a comprehensive European genetic map that will greatly facilitate inter-population genetic studies including genome wide association studies (GWAS).
Down syndrome (DS) is one of the most frequent congenital birth defects, and the most common genetic cause of mental retardation. In most cases, DS results from the presence of an extra copy of chromosome 21. DS has a complex phenotype, and a major goal of DS research is to identify genotype–phenotype correlations. Cases of partial trisomy 21 and other HSA21 rearrangements associated with DS features could identify genomic regions associated with specific phenotypes. We have developed a BAC array spanning HSA21q and used array comparative genome hybridization (aCGH) to enable high-resolution mapping of pathogenic partial aneuploidies and unbalanced translocations involving HSA21. We report the identification and mapping of 30 pathogenic chromosomal aberrations of HSA21 consisting of 19 partial trisomies and 11 partial monosomies for different segments of HSA21. The breakpoints have been mapped to within ∼85 kb. The majority of the breakpoints (26 of 30) for the partial aneuploidies map within a 10-Mb region. Our data argue against a single DS critical region. We identify susceptibility regions for 25 phenotypes for DS and 27 regions for monosomy 21. However, most of these regions are still broad, and more cases are needed to narrow down the phenotypic maps to a reasonable number of candidate genomic elements per phenotype.
Down syndrome; genotype–phenotype correlations; chromosome 21; array CGH
The fraction of experimentally active conserved non-coding sequences within any given cell type is low, so classical assays are unlikely to expose their potential.
Conserved non-coding sequences in the human genome are approximately tenfold more abundant than known genes, and have been hypothesized to mark the locations of cis-regulatory elements. However, the global contribution of conserved non-coding sequences to the transcriptional regulation of human genes is currently unknown. Deeply conserved elements shared between humans and teleost fish predominantly flank genes active during morphogenesis and are enriched for positive transcriptional regulatory elements. However, such deeply conserved elements account for <1% of the conserved non-coding sequences in the human genome, which are predominantly mammalian.
We explored the regulatory potential of a large sample of these 'common' conserved non-coding sequences using a variety of classic assays, including chromatin remodeling, and enhancer/repressor and promoter activity. When tested across diverse human model cell types, we find that the fraction of experimentally active conserved non-coding sequences within any given cell type is low (approximately 5%), and that this proportion increases only modestly when considered collectively across cell types.
The results suggest that classic assays of cis-regulatory potential are unlikely to expose the functional potential of the substantial majority of mammalian conserved non-coding sequences in the human genome.
Disease gene identification has made enormous strides in the past twenty years through functional, positional and candidate gene approaches, and more recently by the exploitation of genome-wide strategies. However, although pathogenic mutations in over 2000 genes have been identified as causative of human diseases, much less is known about the relationship between the molecular defects and mechanisms that lead to disease pathology and symptoms. Recent advances in diverse fields such as genomics, proteomics, cell biology, as well as studies on transgenic animals have greatly accelerated our understanding of the biochemical and cellular basis of many diseases but much still remains to be discovered. The current challenge is to understand the molecular and metabolic pathways by which a particular pathogenic variation leads to a specific phenotype. The study of abnormal conditions is of crucial importance for the understanding of normal physiology and often provides us with the rationale for the development of novel therapeutic strategies.