Minipigs have become important biomedical models for human ailments due to similarities in organ anatomy, physiology, and circadian rhythms relative to humans. The homeostasis of circadian rhythms in both central and peripheral tissues is pivotal for numerous biological processes. Hence, biological rhythm disorders may contribute to the onset of cancers and metabolic disorders including obesity and type II diabetes, amongst others. A tight regulation of circadian clock effectors ensures a rhythmic expression profile of output genes which, depending on cell type, constitute about 3–20% of the transcribed mammalian genome. Central to this system is the negative regulator protein Cryptochrome 1 (CRY1) of which the dysfunction or absence has been linked to the pathogenesis of rhythm disorders. In this study, we generated transgenic Bama-minipigs featuring expression of the Cys414-Ala antimorphic human Cryptochrome 1 mutant (hCRY1AP). Using transgenic donor fibroblasts as nuclear donors, the method of handmade cloning (HMC) was used to produce reconstructed embryos, subsequently transferred to surrogate sows. A total of 23 viable piglets were delivered. All were transgenic and seemingly healthy. However, two pigs with high transgene expression succumbed during the first two months. Molecular analyzes in epidermal fibroblasts demonstrated disturbances to the expression profile of core circadian clock genes and elevated expression of the proinflammatory cytokines IL-6 and TNF-α, known to be risk factors in cancer and metabolic disorders.
To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million nonredundant microbial genes, derived from 576.7 Gb sequence, from faecal samples of 124 European individuals. The gene set, ~150 times larger than the human gene complement, contains an overwhelming majority of the prevalent microbial genes of the cohort and likely includes a large proportion of the prevalent human intestinal microbial genes. The genes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, suggesting that the entire cohort harbours between 1000 and 1150 prevalent bacterial species and each individual at least 160 such species, which are also largely shared. We define and describe the minimal gut metagenome and the minimal gut bacterial genome in terms of functions encoded by the gene set.
There are remarkable disparities among patients of different races with prostate cancer; however, the mechanism underlying this difference remains unclear. Here, we present a comprehensive landscape of the transcriptome profiles of 14 primary prostate cancers and their paired normal counterparts from the Chinese population using RNA-seq, revealing tremendous diversity across prostate cancer transcriptomes with respect to gene fusions, long noncoding RNAs (long ncRNA), alternative splicing and somatic mutations. Three of the 14 tumors (21.4%) harbored a TMPRSS2-ERG fusion, and the low prevalence of this fusion in Chinese patients was further confirmed in an additional tumor set (10/54=18.5%). Notably, two novel gene fusions, CTAGE5-KHDRBS3 (20/54=37%) and USP9Y-TTTY15 (19/54=35.2%), occurred frequently in our patient cohort. Further systematic transcriptional profiling identified numerous long ncRNAs that were differentially expressed in the tumors. An analysis of the correlation between expression of long ncRNA and genes suggested that long ncRNAs may have functions beyond transcriptional regulation. This study yielded new insights into the pathogenesis of prostate cancer in the Chinese population.
prostate cancer; RNA sequencing; gene fusions; long ncRNAs; alternative splicing
The utility of induced pluripotent stem cells (iPSCs) as models to study diseases and as sources for cell therapy depends on the integrity of their genomes. Despite recent publications of DNA sequence variations in the iPSCs, the true scope of such changes for the entire genome is not clear. Here we report the whole-genome sequencing of three human iPSC lines derived from two cell types of an adult donor by episomal vectors. The vector sequence was undetectable in the deeply sequenced iPSC lines. We identified 1058–1808 heterozygous single nucleotide variants (SNVs), but no copy number variants, in each iPSC line. Six to twelve of these SNVs were within coding regions in each iPSC line, but ~50% of them are synonymous changes and the remaining are not selectively enriched for known genes associated with cancers. Our data thus suggest that episome-mediated reprogramming is not inherently mutagenic during integration-free iPSC induction.
Human iPS cells; Reprogramming; Episomal vectors; Integration-free; Genetic mutations; Whole Genome Sequencing
The applications of massively parallel sequencing technology to fetal cell-free DNA (cff-DNA) have brought new insight to non-invasive prenatal diagnosis. However, most previous research based on maternal plasma sequencing has been restricted to fetal aneuploidies. To detect specific parentally inherited mutations, invasive approaches to obtain fetal DNA are the current standard in the clinic because of the experimental complexity and resource consumption of previously reported non-invasive approaches.
Here, we present a simple and effective non-invasive method for accurate fetal genome recovery-assisted with parental haplotypes. The parental haplotype were firstly inferred using a combination strategy of trio and unrelated individuals. Assisted with the parental haplotype, we then employed a hidden Markov model to non-invasively recover the fetal genome through maternal plasma sequencing.
Using a sequence depth of approximately 44X against a an approximate 5.69% cff-DNA concentration, we non-invasively inferred fetal genotype and haplotype under different situations of parental heterozygosity. Our data show that 98.57%, 95.37%, and 98.45% of paternal autosome alleles, maternal autosome alleles, and maternal chromosome X in the fetal haplotypes, respectively, were recovered accurately. Additionally, we obtained efficient coverage or strong linkage of 96.65% of reported Mendelian-disorder genes and 98.90% of complex disease-associated markers.
Our method provides a useful strategy for non-invasive whole fetal genome recovery.
Most of colorectal adenocarcinomas are believed to arise from adenomas, which are premalignant lesions. Sequencing the whole exome of the adenoma will help identifying molecular biomarkers that can predict the occurrence of adenocarcinoma more precisely and help understanding the molecular pathways underlying the initial stage of colorectal tumorigenesis. We performed the exome capture sequencing of the normal mucosa, adenoma and adenocarcinoma tissues from the same patient and sequenced the identified mutations in additional 73 adenomas and 288 adenocarcinomas. Somatic single nucleotide variations (SNVs) were identified in both the adenoma and adenocarcinoma by comparing with the normal control from the same patient. We identified 12 nonsynonymous somatic SNVs in the adenoma and 42 nonsynonymous somatic SNVs in the adenocarcinoma. Most of these mutations including OR6X1, SLC15A3, KRTHB4, RBFOX1, LAMA3, CDH20, BIRC6, NMBR, GLCCI1, EFR3A, and FTHL17 were newly reported in colorectal adenomas. Functional annotation of these mutated genes showed that multiple cellular pathways including Wnt, cell adhesion and ubiquitin mediated proteolysis pathways were altered genetically in the adenoma and that the genetic alterations in the same pathways persist in the adenocarcinoma. CDH20 and LAMA3 were mutated in the adenoma while NRXN3 and COL4A6 were mutated in the adenocarcinoma from the same patient, suggesting for the first time that genetic alterations in the cell adhesion pathway occur as early as in the adenoma. Thus, the comparison of genomic mutations between adenoma and adenocarcinoma provides us a new insight into the molecular events governing the early step of colorectal tumorigenesis.
There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity, accuracy and coverage, especially in repeat regions.
To overcome these challenges, we have developed its successor, SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome.
Benchmark using the Assemblathon1 and GAGE datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive to other assemblers on both assembly length and accuracy. We also provide an updated assembly version of the 2008 Asian (YH) genome using SOAPdenovo2. Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp, respectively, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 lower during the point of largest memory consumption.
Genome; Assembly; Contig; Scaffold; Error correction; Gap-filling
It is evident that epigenetic factors, especially DNA methylation, play essential roles in obesity development. Using pig as a model, here we investigated the systematic association between DNA methylation and obesity. We sampled eight variant adipose and two distinct skeletal muscle tissues from three pig breeds living within comparable environments but displaying distinct fat level. We generated 1,381 gigabases (Gb) of sequence data from 180 methylated DNA immunoprecipitation (MeDIP) libraries, and provided a genome-wide DNA methylation map as well as a gene expression map for adipose and muscle studies. The analysis showed global similarity and difference among breeds, sexes and anatomic locations, and identified the differentially methylated regions (DMRs). The DMRs in promoters are highly associated with obesity development via expression repression of both known obesity-related genes and novel genes. This comprehensive map provides a solid basis for exploring epigenetic mechanisms of adipose deposition and muscle growth.
The pig is an economically important food source, amounting to approximately 40% of all meat consumed worldwide. Pigs also serve as an important model organism because of their similarity to humans at the anatomical, physiological and genetic level, making them very useful for studying a variety of human diseases. A pig strain of particular interest is the miniature pig, specifically the Wuzhishan pig (WZSP), as it has been extensively inbred. Its high level of homozygosity offers increased ease for selective breeding for specific traits and a more straightforward understanding of the genetic changes that underlie its biological characteristics. WZSP also serves as a promising means for applications in surgery, tissue engineering, and xenotransplantation. Here, we report the sequencing and analysis of an inbreeding WZSP genome.
Our results reveal some unique genomic features, including a relatively high level of homozygosity in the diploid genome, an unusual distribution of heterozygosity, an over-representation of tRNA-derived transposable elements, a small amount of porcine endogenous retrovirus, and a lack of type C retroviruses. In addition, we carried out systematic research on gene evolution, together with a detailed investigation of the counterparts of human drug target genes.
Our results provide the opportunity to more clearly define the genomic character of pig, which could enhance our ability to create more useful pig models.
Wuzhishan pig; Genome; Homozygosis; Transposable element; Endogenous retrovirus; Animal model
Cancers arise through an evolutionary process in which cell populations are subjected to selection; however, to date, the process of bladder cancer, which is one of the most common cancers in the world, remains unknown at a single-cell level.
We carried out single-cell exome sequencing of 66 individual tumor cells from a muscle-invasive bladder transitional cell carcinoma (TCC). Analyses of the somatic mutant allele frequency spectrum and clonal structure revealed that the tumor cells were derived from a single ancestral cell, but that subsequent evolution occurred, leading to two distinct tumor cell subpopulations. By analyzing recurrently mutant genes in an additional cohort of 99 TCC tumors, we identified genes that might play roles in the maintenance of the ancestral clone and in the muscle-invasive capability of subclones of this bladder cancer, respectively.
This work provides a new approach of investigating the genetic details of bladder tumoral changes at the single-cell level and a new method for assessing bladder cancer evolution at a cell-population level.
Single-cell exome sequencing; Bladder cancer; Tumor evolution; Population genetics
Extreme altitude can induce a range of cellular and systemic responses. Although it is known that hypoxia underlies the major changes and that the physiological responses include hemodynamic changes and erythropoiesis, the molecular mechanisms and signaling pathways mediating such changes are largely unknown. To obtain a more complete picture of the transcriptional regulatory landscape and networks involved in extreme altitude response, we followed four climbers on an expedition up Mount Xixiabangma (8,012 m), and collected blood samples at four stages during the climb for mRNA and miRNA expression assays. By analyzing dynamic changes of gene networks in response to extreme altitudes, we uncovered a highly modular network with 7 modules of various functions that changed in response to extreme altitudes. The erythrocyte differentiation module is the most prominently up-regulated, reflecting increased erythrocyte differentiation from hematopoietic stem cells, probably at the expense of differentiation into other cell lineages. These changes are accompanied by coordinated down-regulation of general translation. Network topology and flow analyses also uncovered regulators known to modulate hypoxia responses and erythrocyte development, as well as unknown regulators, such as the OCT4 gene, an important regulator in stem cells and assumed to only function in stem cells. We predicted computationally and validated experimentally that increased OCT4 expression at extreme altitude can directly elevate the expression of hemoglobin genes. Our approach established a new framework for analyzing the transcriptional regulatory network from a very limited number of samples.
DNA methylation aberration and microRNA (miRNA) deregulation have been observed in many types of cancers. A systematic study of methylome and transcriptome in bladder urothelial carcinoma has never been reported.
The DNA methylation was profiled by modified methylation-specific digital karyotyping (MMSDK) and the expression of mRNAs and miRNAs was analyzed by digital gene expression (DGE) sequencing in tumors and matched normal adjacent tissues obtained from 9 bladder urothelial carcinoma patients. We found that a set of significantly enriched pathways disrupted in bladder urothelial carcinoma primarily related to “neurogenesis” and “cell differentiation” by integrated analysis of -omics data. Furthermore, we identified an intriguing collection of cancer-related genes that were deregulated at the levels of DNA methylation and mRNA expression, and we validated several of these genes (HIC1, SLIT2, RASAL1, and KRT17) by Bisulfite Sequencing PCR and Reverse Transcription qPCR in a panel of 33 bladder cancer samples.
We characterized the profiles between methylome and transcriptome in bladder urothelial carcinoma, identified a set of significantly enriched key pathways, and screened four aberrantly methylated and expressed genes. Conclusively, our findings shed light on a new avenue for basic bladder cancer research.
Animal breeding via Somatic Cell Nuclear Transfer (SCNT) has enormous potential in agriculture and biomedicine. However, concerns about whether SCNT animals are as healthy or epigenetically normal as conventionally bred ones are raised as the efficiency of cloning by SCNT is much lower than natural breeding or In-vitro fertilization (IVF). Thus, we have conducted a genome-wide gene expression and DNA methylation profiling between phenotypically normal cloned pigs and control pigs in two tissues (muscle and liver), using Affymetrix Porcine expression array as well as modified methylation-specific digital karyotyping (MMSDK) and Solexa sequencing technology. Typical tissue-specific differences with respect to both gene expression and DNA methylation were observed in muscle and liver from cloned as well as control pigs. Gene expression profiles were highly similar between cloned pigs and controls, though a small set of genes showed altered expression. Cloned pigs presented a more different pattern of DNA methylation in unique sequences in both tissues. Especially a small set of genomic sites had different DNA methylation status with a trend towards slightly increased methylation levels in cloned pigs. Molecular network analysis of the genes that contained such differential methylation loci revealed a significant network related to tissue development. In conclusion, our study showed that phenotypically normal cloned pigs were highly similar with normal breeding pigs in their gene expression, but moderate alteration in DNA methylation aspects still exists, especially in certain unique genomic regions.
Genome-wide gene expression profile using deep sequencing technologies can drive the discovery of cancer biomarkers and therapeutic targets. Such efforts are often limited to profiling the expression signature of either mRNA or microRNA (miRNA) in a single type of cancer.
Here we provided an integrated analysis of the genome-wide mRNA and miRNA expression profiles of three different genitourinary cancers: carcinomas of the bladder, kidney and testis.
Our results highlight the general or cancer-specific roles of several genes and miRNAs that may serve as candidate oncogenes or suppressors of tumor development. Further comparative analyses at the systems level revealed that significant aberrations of the cell adhesion process, p53 signaling, calcium signaling, the ECM-receptor and cell cycle pathways, the DNA repair and replication processes and the immune and inflammatory response processes were the common hallmarks of human cancers. Gene sets showing testicular cancer-specific deregulation patterns were mainly implicated in processes related to male reproductive function, and general disruptions of multiple metabolic pathways and processes related to cell migration were the characteristic molecular events for renal and bladder cancer, respectively. Furthermore, we also demonstrated that tumors with the same histological origins and genes with similar functions tended to group together in a clustering analysis. By assessing the correlation between the expression of each miRNA and its targets, we determined that deregulation of ‘key’ miRNAs may result in the global aberration of one or more pathways or processes as a whole.
This systematic analysis deciphered the molecular phenotypes of three genitourinary cancers and investigated their variations at the miRNA level simultaneously. Our results provided a valuable source for future studies and highlighted some promising genes, miRNAs, pathways and processes that may be useful for diagnostic or therapeutic applications.
Myopia is the most common ocular disorder worldwide, and high myopia in particular is one of the leading causes of blindness. Genetic factors play a critical role in the development of myopia, especially high myopia. Recently, the exome sequencing approach has been successfully used for the disease gene identification of Mendelian disorders. Here we show a successful application of exome sequencing to identify a gene for an autosomal dominant disorder, and we have identified a gene potentially responsible for high myopia in a monogenic form. We captured exomes of two affected individuals from a Han Chinese family with high myopia and performed sequencing analysis by a second-generation sequencer with a mean coverage of 30× and sufficient depth to call variants at ∼97% of each targeted exome. The shared genetic variants of these two affected individuals in the family being studied were filtered against the 1000 Genomes Project and the dbSNP131 database. A mutation A672G in zinc finger protein 644 isoform 1 (ZNF644) was identified as being related to the phenotype of this family. After we performed sequencing analysis of the exons in the ZNF644 gene in 300 sporadic cases of high myopia, we identified an additional five mutations (I587V, R680G, C699Y, 3′UTR+12 C>G, and 3′UTR+592 G>A) in 11 different patients. All these mutations were absent in 600 normal controls. The ZNF644 gene was expressed in human retinal and retinal pigment epithelium (RPE). Given that ZNF644 is predicted to be a transcription factor that may regulate genes involved in eye development, mutation may cause the axial elongation of eyeball found in high myopia patients. Our results suggest that ZNF644 might be a causal gene for high myopia in a monogenic form.
People with myopia see near objects more clearly than objects far away. Myopia is the most common ocular disorder worldwide, with a high prevalence in Asian (40%–70%) and Caucasian (20%–30%) populations. Although the etiologies of myopia have not yet been established, previous studies have indicated the involvement of genetic and environmental factors (such as close working habits, higher education levels, and higher socioeconomic class). Genetic factors play a critical role in the development of myopia, especially high myopia. In this study, we use exome sequencing, a powerful tool for a disease gene identification, to identify a gene involved in high myopia in a monogenic form among Han Chinese. Mutations in zinc finger protein 644 isoform 1 (ZNF644) were identified as potentially responsible for the phenotype of high myopia. The main feature of high myopia is axial elongation of the eye globe. Given that ZNF644 is predicted to be a transcription factor that may regulate genes involved in eye development, a mutant ZNF644 protein may impact the normal eye development and therefore may underlie the axial elongation of the eye globe in high myopia patients. Further study of the biological function of ZNF644 will provide insight into the pathogenesis of myopia.
A total of $275 million has been launched to The Cancer Genome Atlas Project for genomic mapping of more than 20 types of cancers. The major challenge is to develop high throughput and cost-effective techniques for human genome sequencing. We developed a targeted exome sequencing technology to routinely determine human exome sequence. As a proof-of-concept, we chose a unique patient, who underwent three high mortalities cancers, i.e., breast, gallbladder and lung cancers, to reveal the genetic cause of high-cancer-susceptibility. Total 24,545 SNPs were detected. 10,868 (44.27%) SNPs were within coding regions, and 1,077 (4.38%) located in the UTRs. 3367 genes were hit by 4480 non-sysnonymous mutations in CDS with truncation of 30 proteins; and 10 mutations occurred at the splice sites that would generate different protein isoforms. Substitutions or premature terminations occurred in 132 proteins encoded by cancer-associated genes. CARD8 was completely loss; ANAPC1 was pre-translationally terminated from the transcripts of one allele. On the Ras-MAPK pathway, 18 genes were homozygously mutated. 15 growth factors/cytokines and their receptors, 9 transcription factors, 6 proteins on WNT signaling pathway, and 16 cell surface and extracellular proteins may be dysfunctioned. Exome sequencing made it possible for individualized cancer therapy.
whole-exome sequencing; cancer; genetic variations; high-cancer-susceptibility
With the advent of second-generation sequencing, the expression of gene transcripts can be digitally measured with high accuracy. The purpose of this study was to systematically profile the expression of both mRNA and miRNA genes in clear cell renal cell carcinoma (ccRCC) using massively parallel sequencing technology.
The expression of mRNAs and miRNAs were analyzed in tumor tissues and matched normal adjacent tissues obtained from 10 ccRCC patients without distant metastases. In a prevalence screen, some of the most interesting results were validated in a large cohort of ccRCC patients.
A total of 404 miRNAs and 9,799 mRNAs were detected to be differentially expressed in the 10 ccRCC patients. We also identified 56 novel miRNA candidates in at least two samples. In addition to confirming that canonical cancer genes and miRNAs (including VEGFA, DUSP9 and ERBB4; miR-210, miR-184 and miR-206) play pivotal roles in ccRCC development, promising novel candidates (such as PNCK and miR-122) without previous annotation in ccRCC carcinogenesis were also discovered in this study. Pathways controlling cell fates (e.g., cell cycle and apoptosis pathways) and cell communication (e.g., focal adhesion and ECM-receptor interaction) were found to be significantly more likely to be disrupted in ccRCC. Additionally, the results of the prevalence screen revealed that the expression of a miRNA gene cluster located on Xq27.3 was consistently downregulated in at least 76.7% of ∼50 ccRCC patients.
Our study provided a two-dimensional map of the mRNA and miRNA expression profiles of ccRCC using deep sequencing technology. Our results indicate that the phenotypic status of ccRCC is characterized by a loss of normal renal function, downregulation of metabolic genes, and upregulation of many signal transduction genes in key pathways. Furthermore, it can be concluded that downregulation of miRNA genes clustered on Xq27.3 is associated with ccRCC.
Analysis across the genome of patterns of DNA methylation reveals a rich landscape of allele-specific epigenetic modification and consequent effects on allele-specific gene expression.
DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and <0.2% of non-CpG sites were methylated, demonstrating that non-CpG cytosine methylation is minor in human PBMC. Analysis of the PBMC methylome revealed a rich epigenomic landscape for 20 distinct genomic features, including regulatory, protein-coding, non-coding, RNA-coding, and repeat sequences. Integration of our methylome data with the YH genome sequence enabled a first comprehensive assessment of allele-specific methylation (ASM) between the two haploid methylomes of any individual and allowed the identification of 599 haploid differentially methylated regions (hDMRs) covering 287 genes. Of these, 76 genes had hDMRs within 2 kb of their transcriptional start sites of which >80% displayed allele-specific expression (ASE). These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies.
Epigenetic modifications such as addition of methyl groups to cytosine in DNA play a role in regulating gene expression. To better understand these processes, knowledge of the methylation status of all cytosine bases in the genome (the methylome) is required. DNA methylation can differ between the two gene copies (alleles) in each cell. Such allele-specific methylation (ASM) can be due to parental origin of the alleles (imprinting), X chromosome inactivation in females, and other as yet unknown mechanisms. This may significantly alter the expression profile arising from different allele combinations in different individuals. Using advanced sequencing technology, we have determined the methylome of human peripheral blood mononuclear cells (PBMC). Importantly, the PBMC were obtained from the same male Han Chinese individual whose complete genome had previously been determined. This allowed us, for the first time, to study genome-wide differences in ASM. Our analysis shows that ASM in PBMC is higher than can be accounted for by regions known to undergo parent-of-origin imprinting and frequently (>80%) correlates with allele-specific expression (ASE) of the corresponding gene. In addition, our data reveal a rich landscape of epigenomic variation for 20 genomic features, including regulatory, coding, and non-coding sequences, and provide a valuable resource for future studies. Our work further establishes whole-genome sequencing as an efficient method for methylome analysis.
Bacterial pathogens typically infect only a limited range of hosts; however, the genetic mechanisms governing host-specificity are poorly understood. The α-proteobacterial genus Bartonella comprises 21 species that cause host-specific intraerythrocytic bacteremia as hallmark of infection in their respective mammalian reservoirs, including the human-specific pathogens Bartonella quintana and Bartonella bacilliformis that cause trench fever and Oroya fever, respectively. Here, we have identified bacterial factors that mediate host-specific erythrocyte colonization in the mammalian reservoirs. Using mouse-specific Bartonella birtlesii, human-specific Bartonella quintana, cat-specific Bartonella henselae and rat-specific Bartonella tribocorum, we established in vitro adhesion and invasion assays with isolated erythrocytes that fully reproduce the host-specificity of erythrocyte infection as observed in vivo. By signature-tagged mutagenesis of B. birtlesii and mutant selection in a mouse infection model we identified mutants impaired in establishing intraerythrocytic bacteremia. Among 45 abacteremic mutants, five failed to adhere to and invade mouse erythrocytes in vitro. The corresponding genes encode components of the type IV secretion system (T4SS) Trw, demonstrating that this virulence factor laterally acquired by the Bartonella lineage is directly involved in adherence to erythrocytes. Strikingly, ectopic expression of Trw of rat-specific B. tribocorum in cat-specific B. henselae or human-specific B. quintana expanded their host range for erythrocyte infection to rat, demonstrating that Trw mediates host-specific erythrocyte infection. A molecular evolutionary analysis of the trw locus further indicated that the variable, surface-located TrwL and TrwJ might represent the T4SS components that determine host-specificity of erythrocyte parasitism. In conclusion, we show that the laterally acquired Trw T4SS diversified in the Bartonella lineage to facilitate host-restricted adhesion to erythrocytes in a wide range of mammals.
Pathogens are—as the result of adaptive evolution in their principal host(s)—typically limited in the range of hosts that they can infect successfully. However, infrequently such host-restricted pathogens may undergo a spontaneous host switch, which can lead to the evolution of pathogens with altered host specificity. Most human pathogens evolved this way, and animal-specific pathogens have thus to be considered as an important reservoir for the emergence of novel human pathogens. Despite host-specificity representing a common feature of pathogens, the underlying molecular mechanisms are largely unknown. In this study we have used bacterial pathogens of the genus Bartonella to identify bacterial factors involved in the determination of host specificity. The bartonellae represent an excellent model to study host-specificity as each species is adapted to cause an intracellular infection of erythrocytes exclusively in its respective reservoir host(s). Using a genetic approach in combination with erythrocyte infection models in vitro and in vivo we demonstrate that a surface-located bacterial nanomachine—a so-called type IV secretion system—determines host specificity of erythrocyte infection. Our work sheds light on the molecular basis of host specificity and establishes an experimental model for studying the evolutionary processes facilitating spontaneous host shifts.
The alternative synonymous codons in Corynebacterium glutamicum, a well-known bacterium used in industry for the production of amino acid, have been investigated by multivariate analysis. As C. glutamicum is a GC-rich organism, G and C are expected to predominate at the third position of codons. Indeed, overall codon usage analyses have indicated that C and/or G ending codons are predominant in this organism. Through multivariate statistical analysis, apart from mutational selection, we identified three other trends of codon usage variation among the genes. Firstly, the majority of highly expressed genes are scattered towards the positive end of the first axis, whereas the majority of lowly expressed genes are clustered towards the other end of the first axis. Furthermore, the distinct difference in the two sets of genes was that the C ending codons are predominate in putatively highly expressed genes, suggesting that the C ending codons are translationally optimal in this organism. Secondly, the majority of the putatively highly expressed genes have a tendency to locate on the leading strand, which indicates that replicational and transciptional selection might be invoked. Thirdly, highly expressed genes are more conserved than lowly expressed genes by synonymous and nonsynonymous substitutions among orthologous genes fromthe genomes of C. glutamicum and C. diphtheriae. We also analyzed other factors such as the length of genes and hydrophobicity that might influence codon usage and found their contributions to be weak.
Important biological and pathological properties are often conserved across species. Although several mouse leukemia models have been well established, the genes deregulated in both human and murine leukemia cells have not been studied systematically. We performed a serial analysis of gene expression (SAGE) analysis on gene expression in both human and murine MLL-ELL or MLL-ENL leukemia cells, and identified 88 genes that appeared to be significantly deregulated in both types of leukemia cells, including 57 genes not reported previously as being deregulated in MLL-associated leukemias. These changes were validated by quantitative PCR. The most up-regulated genes include several HOX genes (e.g., HOX A5, HOXA9 and HOXA10) and MEIS1 that are the typical hallmark of MLL-rearrangement leukemia. The most down-regulated genes include LTF, LCN2, MMP9, S100A8, S100A9, PADI4, TGFBI and CYBB. Notably, the up-regulated genes are enriched in Gene Ontology terms such as “gene expression” and “transcription”, whereas the down-regulated genes are enriched in “signal transduction” and “apoptosis”. We showed that the CpG islands of the down-regulated genes are hypermethylated. We also showed that seven individual microRNAs from the mir-17-92 cluster, which are known to be overexpressed in human MLL-rearrangement leukemias, are also consistently overexpressed in mouse MLL-rearrangement leukemia cells. Nineteen possible targets of these microRNAs were identified and two of them (i.e., APP and RASSF2) were confirmed further by luciferase reporter and mutagenesis assays. The identification and validation of consistent changes of gene expression in human and murine MLL-rearrangement leukemias provides important insights into the genetic base for MLL-associated leukemogenesis.
MLL-rearrangement leukemia; evolutionarily conservation; gene expression; gene ontology; DNA methylation
Understanding the key process of human mutation is important for many aspects of medical genetics and human evolution. In the past, estimates of mutation rates have generally been inferred from phenotypic observations or comparisons of homologous sequences among closely related species [1–3]. Here, we apply new sequencing technology to measure directly one mutation rate, that of base substitutions on the human Y chromosome. The Y chromosomes of two individuals separated by 13 generations were flow sorted and sequenced by Illumina (Solexa) paired-end sequencing to an average depth of 11× or 20×, respectively . Candidate mutations were further examined by capillary sequencing in cell-line and blood DNA from the donors and additional family members. Twelve mutations were confirmed in ∼10.15 Mb; eight of these had occurred in vitro and four in vivo. The latter could be placed in different positions on the pedigree and led to a mutation-rate measurement of 3.0 × 10−8 mutations/nucleotide/generation (95% CI: 8.9 × 10−9–7.0 × 10−8), consistent with estimates of 2.3 × 10−8–6.3 × 10−8 mutations/nucleotide/generation for the same Y-chromosomal region from published human-chimpanzee comparisons  depending on the generation and split times assumed.
DNA methylation is a widely studied epigenetic mechanism known to correlate with gene repression and genomic stability. Development of sensitive methods for global detection of DNA methylation events is of particular importance.
We here describe a technique, called modified methylation-specific digital karyotyping (MMSDK) based on methylation-specific digital karyotyping (MSDK) with a novel sequencing approach. Briefly, after a tandem digestion of genomic DNA with a methylation-sensitive mapping enzyme and a fragmenting enzyme, short sequence tags are obtained. These tags are amplified, followed by direct, massively parallel sequencing (Solexa 1G Genome Analyzer). This method allows high-throughput and low-cost genome-wide DNA methylation mapping. We applied this method to investigate global DNA methylation profiles for widely used breast cancer cell lines, MCF-7 and MDA-MB-231, which are representatives for luminal-like and mesenchymal-like cancer types, respectively. By comparison, a highly similar overall DNA methylation pattern was revealed for the two cell lines. However a cohort of individual genomic loci with significantly different DNA methylation status between two cell lines was identified. Furthermore, we revealed a genome-wide significant correlation between gene expression and the methylation status of gene promoters with CpG islands (CGIs) in the two cancer cell lines, and a correlation of gene expression and the methylation status of promoters without CGIs in MCF-7 cells.
The MMSDK method will be a valuable tool to increase the current knowledge of genome wide DNA methylation profiles.
Mycobacterial pathogens are a major threat to humans. With the increasing availability of functional genomic data, research on mycobacterial pathogenesis and subsequent control strategies will be greatly accelerated. It has been suggested that genome polymorphisms, namely large sequence polymorphisms, can influence the pathogenicity of different mycobacterial strains. However, there is currently no database dedicated to mycobacterial genome polymorphisms with functional interpretations.
We have developed a mycobacterial database (MyBASE) housing genome polymorphism data and gene functions to provide the mycobacterial research community with a useful information resource and analysis platform. Whole genome comparison data produced by our lab and the novel genome polymorphisms identified were deposited into MyBASE. Extensive literature review of genome polymorphism data, mainly large sequence polymorphisms (LSPs), operon predictions and curated annotations of virulence and essentiality of mycobacterial genes are unique features of MyBASE. Large-scale genomic data integration from public resources makes MyBASE a comprehensive data warehouse useful for current research. All data is cross-linked and can be graphically viewed via a toolbox in MyBASE.
As an integrated platform focused on the collection of experimental data from our own lab and published literature, MyBASE will facilitate analysis of genome structure and polymorphisms, which will provide insight into genome evolution. Importantly, the database will also facilitate the comparison of virulence factors among various mycobacterial strains. MyBASE is freely accessible via http://mybase.psych.ac.cn.
The YH database is a server that allows the user to easily browse and download data from the first Asian diploid genome. The aim of this platform is to facilitate the study of this Asian genome and to enable improved organization and presentation large-scale personal genome data. Powered by GBrowse, we illustrate here the genome sequences, SNPs, and sequencing reads in the MapView. The relationships between phenotype and genotype can be searched by location, dbSNP ID, HGMD ID, gene symbol and disease name. A BLAST web service is also provided for the purpose of aligning query sequence against YH genome consensus. The YH database is currently one of the three personal genome database, organizing the original data and analysis results in a user-friendly interface, which is an endeavor to achieve fundamental goals for establishing personal medicine. The database is available at http://yh.genomics.org.cn.