The utility of induced pluripotent stem cells (iPSCs) as models to study diseases and as sources for cell therapy depends on the integrity of their genomes. Despite recent publications of DNA sequence variations in the iPSCs, the true scope of such changes for the entire genome is not clear. Here we report the whole-genome sequencing of three human iPSC lines derived from two cell types of an adult donor by episomal vectors. The vector sequence was undetectable in the deeply sequenced iPSC lines. We identified 1058–1808 heterozygous single nucleotide variants (SNVs), but no copy number variants, in each iPSC line. Six to twelve of these SNVs were within coding regions in each iPSC line, but ~50% of them are synonymous changes and the remaining are not selectively enriched for known genes associated with cancers. Our data thus suggest that episome-mediated reprogramming is not inherently mutagenic during integration-free iPSC induction.
Human iPS cells; Reprogramming; Episomal vectors; Integration-free; Genetic mutations; Whole Genome Sequencing
Most of colorectal adenocarcinomas are believed to arise from adenomas, which are premalignant lesions. Sequencing the whole exome of the adenoma will help identifying molecular biomarkers that can predict the occurrence of adenocarcinoma more precisely and help understanding the molecular pathways underlying the initial stage of colorectal tumorigenesis. We performed the exome capture sequencing of the normal mucosa, adenoma and adenocarcinoma tissues from the same patient and sequenced the identified mutations in additional 73 adenomas and 288 adenocarcinomas. Somatic single nucleotide variations (SNVs) were identified in both the adenoma and adenocarcinoma by comparing with the normal control from the same patient. We identified 12 nonsynonymous somatic SNVs in the adenoma and 42 nonsynonymous somatic SNVs in the adenocarcinoma. Most of these mutations including OR6X1, SLC15A3, KRTHB4, RBFOX1, LAMA3, CDH20, BIRC6, NMBR, GLCCI1, EFR3A, and FTHL17 were newly reported in colorectal adenomas. Functional annotation of these mutated genes showed that multiple cellular pathways including Wnt, cell adhesion and ubiquitin mediated proteolysis pathways were altered genetically in the adenoma and that the genetic alterations in the same pathways persist in the adenocarcinoma. CDH20 and LAMA3 were mutated in the adenoma while NRXN3 and COL4A6 were mutated in the adenocarcinoma from the same patient, suggesting for the first time that genetic alterations in the cell adhesion pathway occur as early as in the adenoma. Thus, the comparison of genomic mutations between adenoma and adenocarcinoma provides us a new insight into the molecular events governing the early step of colorectal tumorigenesis.
It is evident that epigenetic factors, especially DNA methylation, play essential roles in obesity development. Using pig as a model, here we investigated the systematic association between DNA methylation and obesity. We sampled eight variant adipose and two distinct skeletal muscle tissues from three pig breeds living within comparable environments but displaying distinct fat level. We generated 1,381 gigabases (Gb) of sequence data from 180 methylated DNA immunoprecipitation (MeDIP) libraries, and provided a genome-wide DNA methylation map as well as a gene expression map for adipose and muscle studies. The analysis showed global similarity and difference among breeds, sexes and anatomic locations, and identified the differentially methylated regions (DMRs). The DMRs in promoters are highly associated with obesity development via expression repression of both known obesity-related genes and novel genes. This comprehensive map provides a solid basis for exploring epigenetic mechanisms of adipose deposition and muscle growth.
Extreme altitude can induce a range of cellular and systemic responses. Although it is known that hypoxia underlies the major changes and that the physiological responses include hemodynamic changes and erythropoiesis, the molecular mechanisms and signaling pathways mediating such changes are largely unknown. To obtain a more complete picture of the transcriptional regulatory landscape and networks involved in extreme altitude response, we followed four climbers on an expedition up Mount Xixiabangma (8,012 m), and collected blood samples at four stages during the climb for mRNA and miRNA expression assays. By analyzing dynamic changes of gene networks in response to extreme altitudes, we uncovered a highly modular network with 7 modules of various functions that changed in response to extreme altitudes. The erythrocyte differentiation module is the most prominently up-regulated, reflecting increased erythrocyte differentiation from hematopoietic stem cells, probably at the expense of differentiation into other cell lineages. These changes are accompanied by coordinated down-regulation of general translation. Network topology and flow analyses also uncovered regulators known to modulate hypoxia responses and erythrocyte development, as well as unknown regulators, such as the OCT4 gene, an important regulator in stem cells and assumed to only function in stem cells. We predicted computationally and validated experimentally that increased OCT4 expression at extreme altitude can directly elevate the expression of hemoglobin genes. Our approach established a new framework for analyzing the transcriptional regulatory network from a very limited number of samples.
DNA methylation aberration and microRNA (miRNA) deregulation have been observed in many types of cancers. A systematic study of methylome and transcriptome in bladder urothelial carcinoma has never been reported.
The DNA methylation was profiled by modified methylation-specific digital karyotyping (MMSDK) and the expression of mRNAs and miRNAs was analyzed by digital gene expression (DGE) sequencing in tumors and matched normal adjacent tissues obtained from 9 bladder urothelial carcinoma patients. We found that a set of significantly enriched pathways disrupted in bladder urothelial carcinoma primarily related to “neurogenesis” and “cell differentiation” by integrated analysis of -omics data. Furthermore, we identified an intriguing collection of cancer-related genes that were deregulated at the levels of DNA methylation and mRNA expression, and we validated several of these genes (HIC1, SLIT2, RASAL1, and KRT17) by Bisulfite Sequencing PCR and Reverse Transcription qPCR in a panel of 33 bladder cancer samples.
We characterized the profiles between methylome and transcriptome in bladder urothelial carcinoma, identified a set of significantly enriched key pathways, and screened four aberrantly methylated and expressed genes. Conclusively, our findings shed light on a new avenue for basic bladder cancer research.
Animal breeding via Somatic Cell Nuclear Transfer (SCNT) has enormous potential in agriculture and biomedicine. However, concerns about whether SCNT animals are as healthy or epigenetically normal as conventionally bred ones are raised as the efficiency of cloning by SCNT is much lower than natural breeding or In-vitro fertilization (IVF). Thus, we have conducted a genome-wide gene expression and DNA methylation profiling between phenotypically normal cloned pigs and control pigs in two tissues (muscle and liver), using Affymetrix Porcine expression array as well as modified methylation-specific digital karyotyping (MMSDK) and Solexa sequencing technology. Typical tissue-specific differences with respect to both gene expression and DNA methylation were observed in muscle and liver from cloned as well as control pigs. Gene expression profiles were highly similar between cloned pigs and controls, though a small set of genes showed altered expression. Cloned pigs presented a more different pattern of DNA methylation in unique sequences in both tissues. Especially a small set of genomic sites had different DNA methylation status with a trend towards slightly increased methylation levels in cloned pigs. Molecular network analysis of the genes that contained such differential methylation loci revealed a significant network related to tissue development. In conclusion, our study showed that phenotypically normal cloned pigs were highly similar with normal breeding pigs in their gene expression, but moderate alteration in DNA methylation aspects still exists, especially in certain unique genomic regions.
Genome-wide gene expression profile using deep sequencing technologies can drive the discovery of cancer biomarkers and therapeutic targets. Such efforts are often limited to profiling the expression signature of either mRNA or microRNA (miRNA) in a single type of cancer.
Here we provided an integrated analysis of the genome-wide mRNA and miRNA expression profiles of three different genitourinary cancers: carcinomas of the bladder, kidney and testis.
Our results highlight the general or cancer-specific roles of several genes and miRNAs that may serve as candidate oncogenes or suppressors of tumor development. Further comparative analyses at the systems level revealed that significant aberrations of the cell adhesion process, p53 signaling, calcium signaling, the ECM-receptor and cell cycle pathways, the DNA repair and replication processes and the immune and inflammatory response processes were the common hallmarks of human cancers. Gene sets showing testicular cancer-specific deregulation patterns were mainly implicated in processes related to male reproductive function, and general disruptions of multiple metabolic pathways and processes related to cell migration were the characteristic molecular events for renal and bladder cancer, respectively. Furthermore, we also demonstrated that tumors with the same histological origins and genes with similar functions tended to group together in a clustering analysis. By assessing the correlation between the expression of each miRNA and its targets, we determined that deregulation of ‘key’ miRNAs may result in the global aberration of one or more pathways or processes as a whole.
This systematic analysis deciphered the molecular phenotypes of three genitourinary cancers and investigated their variations at the miRNA level simultaneously. Our results provided a valuable source for future studies and highlighted some promising genes, miRNAs, pathways and processes that may be useful for diagnostic or therapeutic applications.
Myopia is the most common ocular disorder worldwide, and high myopia in particular is one of the leading causes of blindness. Genetic factors play a critical role in the development of myopia, especially high myopia. Recently, the exome sequencing approach has been successfully used for the disease gene identification of Mendelian disorders. Here we show a successful application of exome sequencing to identify a gene for an autosomal dominant disorder, and we have identified a gene potentially responsible for high myopia in a monogenic form. We captured exomes of two affected individuals from a Han Chinese family with high myopia and performed sequencing analysis by a second-generation sequencer with a mean coverage of 30× and sufficient depth to call variants at ∼97% of each targeted exome. The shared genetic variants of these two affected individuals in the family being studied were filtered against the 1000 Genomes Project and the dbSNP131 database. A mutation A672G in zinc finger protein 644 isoform 1 (ZNF644) was identified as being related to the phenotype of this family. After we performed sequencing analysis of the exons in the ZNF644 gene in 300 sporadic cases of high myopia, we identified an additional five mutations (I587V, R680G, C699Y, 3′UTR+12 C>G, and 3′UTR+592 G>A) in 11 different patients. All these mutations were absent in 600 normal controls. The ZNF644 gene was expressed in human retinal and retinal pigment epithelium (RPE). Given that ZNF644 is predicted to be a transcription factor that may regulate genes involved in eye development, mutation may cause the axial elongation of eyeball found in high myopia patients. Our results suggest that ZNF644 might be a causal gene for high myopia in a monogenic form.
People with myopia see near objects more clearly than objects far away. Myopia is the most common ocular disorder worldwide, with a high prevalence in Asian (40%–70%) and Caucasian (20%–30%) populations. Although the etiologies of myopia have not yet been established, previous studies have indicated the involvement of genetic and environmental factors (such as close working habits, higher education levels, and higher socioeconomic class). Genetic factors play a critical role in the development of myopia, especially high myopia. In this study, we use exome sequencing, a powerful tool for a disease gene identification, to identify a gene involved in high myopia in a monogenic form among Han Chinese. Mutations in zinc finger protein 644 isoform 1 (ZNF644) were identified as potentially responsible for the phenotype of high myopia. The main feature of high myopia is axial elongation of the eye globe. Given that ZNF644 is predicted to be a transcription factor that may regulate genes involved in eye development, a mutant ZNF644 protein may impact the normal eye development and therefore may underlie the axial elongation of the eye globe in high myopia patients. Further study of the biological function of ZNF644 will provide insight into the pathogenesis of myopia.
A total of $275 million has been launched to The Cancer Genome Atlas Project for genomic mapping of more than 20 types of cancers. The major challenge is to develop high throughput and cost-effective techniques for human genome sequencing. We developed a targeted exome sequencing technology to routinely determine human exome sequence. As a proof-of-concept, we chose a unique patient, who underwent three high mortalities cancers, i.e., breast, gallbladder and lung cancers, to reveal the genetic cause of high-cancer-susceptibility. Total 24,545 SNPs were detected. 10,868 (44.27%) SNPs were within coding regions, and 1,077 (4.38%) located in the UTRs. 3367 genes were hit by 4480 non-sysnonymous mutations in CDS with truncation of 30 proteins; and 10 mutations occurred at the splice sites that would generate different protein isoforms. Substitutions or premature terminations occurred in 132 proteins encoded by cancer-associated genes. CARD8 was completely loss; ANAPC1 was pre-translationally terminated from the transcripts of one allele. On the Ras-MAPK pathway, 18 genes were homozygously mutated. 15 growth factors/cytokines and their receptors, 9 transcription factors, 6 proteins on WNT signaling pathway, and 16 cell surface and extracellular proteins may be dysfunctioned. Exome sequencing made it possible for individualized cancer therapy.
whole-exome sequencing; cancer; genetic variations; high-cancer-susceptibility
With the advent of second-generation sequencing, the expression of gene transcripts can be digitally measured with high accuracy. The purpose of this study was to systematically profile the expression of both mRNA and miRNA genes in clear cell renal cell carcinoma (ccRCC) using massively parallel sequencing technology.
The expression of mRNAs and miRNAs were analyzed in tumor tissues and matched normal adjacent tissues obtained from 10 ccRCC patients without distant metastases. In a prevalence screen, some of the most interesting results were validated in a large cohort of ccRCC patients.
A total of 404 miRNAs and 9,799 mRNAs were detected to be differentially expressed in the 10 ccRCC patients. We also identified 56 novel miRNA candidates in at least two samples. In addition to confirming that canonical cancer genes and miRNAs (including VEGFA, DUSP9 and ERBB4; miR-210, miR-184 and miR-206) play pivotal roles in ccRCC development, promising novel candidates (such as PNCK and miR-122) without previous annotation in ccRCC carcinogenesis were also discovered in this study. Pathways controlling cell fates (e.g., cell cycle and apoptosis pathways) and cell communication (e.g., focal adhesion and ECM-receptor interaction) were found to be significantly more likely to be disrupted in ccRCC. Additionally, the results of the prevalence screen revealed that the expression of a miRNA gene cluster located on Xq27.3 was consistently downregulated in at least 76.7% of ∼50 ccRCC patients.
Our study provided a two-dimensional map of the mRNA and miRNA expression profiles of ccRCC using deep sequencing technology. Our results indicate that the phenotypic status of ccRCC is characterized by a loss of normal renal function, downregulation of metabolic genes, and upregulation of many signal transduction genes in key pathways. Furthermore, it can be concluded that downregulation of miRNA genes clustered on Xq27.3 is associated with ccRCC.
Analysis across the genome of patterns of DNA methylation reveals a rich landscape of allele-specific epigenetic modification and consequent effects on allele-specific gene expression.
DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and <0.2% of non-CpG sites were methylated, demonstrating that non-CpG cytosine methylation is minor in human PBMC. Analysis of the PBMC methylome revealed a rich epigenomic landscape for 20 distinct genomic features, including regulatory, protein-coding, non-coding, RNA-coding, and repeat sequences. Integration of our methylome data with the YH genome sequence enabled a first comprehensive assessment of allele-specific methylation (ASM) between the two haploid methylomes of any individual and allowed the identification of 599 haploid differentially methylated regions (hDMRs) covering 287 genes. Of these, 76 genes had hDMRs within 2 kb of their transcriptional start sites of which >80% displayed allele-specific expression (ASE). These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies.
Epigenetic modifications such as addition of methyl groups to cytosine in DNA play a role in regulating gene expression. To better understand these processes, knowledge of the methylation status of all cytosine bases in the genome (the methylome) is required. DNA methylation can differ between the two gene copies (alleles) in each cell. Such allele-specific methylation (ASM) can be due to parental origin of the alleles (imprinting), X chromosome inactivation in females, and other as yet unknown mechanisms. This may significantly alter the expression profile arising from different allele combinations in different individuals. Using advanced sequencing technology, we have determined the methylome of human peripheral blood mononuclear cells (PBMC). Importantly, the PBMC were obtained from the same male Han Chinese individual whose complete genome had previously been determined. This allowed us, for the first time, to study genome-wide differences in ASM. Our analysis shows that ASM in PBMC is higher than can be accounted for by regions known to undergo parent-of-origin imprinting and frequently (>80%) correlates with allele-specific expression (ASE) of the corresponding gene. In addition, our data reveal a rich landscape of epigenomic variation for 20 genomic features, including regulatory, coding, and non-coding sequences, and provide a valuable resource for future studies. Our work further establishes whole-genome sequencing as an efficient method for methylome analysis.
Bacterial pathogens typically infect only a limited range of hosts; however, the genetic mechanisms governing host-specificity are poorly understood. The α-proteobacterial genus Bartonella comprises 21 species that cause host-specific intraerythrocytic bacteremia as hallmark of infection in their respective mammalian reservoirs, including the human-specific pathogens Bartonella quintana and Bartonella bacilliformis that cause trench fever and Oroya fever, respectively. Here, we have identified bacterial factors that mediate host-specific erythrocyte colonization in the mammalian reservoirs. Using mouse-specific Bartonella birtlesii, human-specific Bartonella quintana, cat-specific Bartonella henselae and rat-specific Bartonella tribocorum, we established in vitro adhesion and invasion assays with isolated erythrocytes that fully reproduce the host-specificity of erythrocyte infection as observed in vivo. By signature-tagged mutagenesis of B. birtlesii and mutant selection in a mouse infection model we identified mutants impaired in establishing intraerythrocytic bacteremia. Among 45 abacteremic mutants, five failed to adhere to and invade mouse erythrocytes in vitro. The corresponding genes encode components of the type IV secretion system (T4SS) Trw, demonstrating that this virulence factor laterally acquired by the Bartonella lineage is directly involved in adherence to erythrocytes. Strikingly, ectopic expression of Trw of rat-specific B. tribocorum in cat-specific B. henselae or human-specific B. quintana expanded their host range for erythrocyte infection to rat, demonstrating that Trw mediates host-specific erythrocyte infection. A molecular evolutionary analysis of the trw locus further indicated that the variable, surface-located TrwL and TrwJ might represent the T4SS components that determine host-specificity of erythrocyte parasitism. In conclusion, we show that the laterally acquired Trw T4SS diversified in the Bartonella lineage to facilitate host-restricted adhesion to erythrocytes in a wide range of mammals.
Pathogens are—as the result of adaptive evolution in their principal host(s)—typically limited in the range of hosts that they can infect successfully. However, infrequently such host-restricted pathogens may undergo a spontaneous host switch, which can lead to the evolution of pathogens with altered host specificity. Most human pathogens evolved this way, and animal-specific pathogens have thus to be considered as an important reservoir for the emergence of novel human pathogens. Despite host-specificity representing a common feature of pathogens, the underlying molecular mechanisms are largely unknown. In this study we have used bacterial pathogens of the genus Bartonella to identify bacterial factors involved in the determination of host specificity. The bartonellae represent an excellent model to study host-specificity as each species is adapted to cause an intracellular infection of erythrocytes exclusively in its respective reservoir host(s). Using a genetic approach in combination with erythrocyte infection models in vitro and in vivo we demonstrate that a surface-located bacterial nanomachine—a so-called type IV secretion system—determines host specificity of erythrocyte infection. Our work sheds light on the molecular basis of host specificity and establishes an experimental model for studying the evolutionary processes facilitating spontaneous host shifts.
The alternative synonymous codons in Corynebacterium glutamicum, a well-known bacterium used in industry for the production of amino acid, have been investigated by multivariate analysis. As C. glutamicum is a GC-rich organism, G and C are expected to predominate at the third position of codons. Indeed, overall codon usage analyses have indicated that C and/or G ending codons are predominant in this organism. Through multivariate statistical analysis, apart from mutational selection, we identified three other trends of codon usage variation among the genes. Firstly, the majority of highly expressed genes are scattered towards the positive end of the first axis, whereas the majority of lowly expressed genes are clustered towards the other end of the first axis. Furthermore, the distinct difference in the two sets of genes was that the C ending codons are predominate in putatively highly expressed genes, suggesting that the C ending codons are translationally optimal in this organism. Secondly, the majority of the putatively highly expressed genes have a tendency to locate on the leading strand, which indicates that replicational and transciptional selection might be invoked. Thirdly, highly expressed genes are more conserved than lowly expressed genes by synonymous and nonsynonymous substitutions among orthologous genes fromthe genomes of C. glutamicum and C. diphtheriae. We also analyzed other factors such as the length of genes and hydrophobicity that might influence codon usage and found their contributions to be weak.
Important biological and pathological properties are often conserved across species. Although several mouse leukemia models have been well established, the genes deregulated in both human and murine leukemia cells have not been studied systematically. We performed a serial analysis of gene expression (SAGE) analysis on gene expression in both human and murine MLL-ELL or MLL-ENL leukemia cells, and identified 88 genes that appeared to be significantly deregulated in both types of leukemia cells, including 57 genes not reported previously as being deregulated in MLL-associated leukemias. These changes were validated by quantitative PCR. The most up-regulated genes include several HOX genes (e.g., HOX A5, HOXA9 and HOXA10) and MEIS1 that are the typical hallmark of MLL-rearrangement leukemia. The most down-regulated genes include LTF, LCN2, MMP9, S100A8, S100A9, PADI4, TGFBI and CYBB. Notably, the up-regulated genes are enriched in Gene Ontology terms such as “gene expression” and “transcription”, whereas the down-regulated genes are enriched in “signal transduction” and “apoptosis”. We showed that the CpG islands of the down-regulated genes are hypermethylated. We also showed that seven individual microRNAs from the mir-17-92 cluster, which are known to be overexpressed in human MLL-rearrangement leukemias, are also consistently overexpressed in mouse MLL-rearrangement leukemia cells. Nineteen possible targets of these microRNAs were identified and two of them (i.e., APP and RASSF2) were confirmed further by luciferase reporter and mutagenesis assays. The identification and validation of consistent changes of gene expression in human and murine MLL-rearrangement leukemias provides important insights into the genetic base for MLL-associated leukemogenesis.
MLL-rearrangement leukemia; evolutionarily conservation; gene expression; gene ontology; DNA methylation
Understanding the key process of human mutation is important for many aspects of medical genetics and human evolution. In the past, estimates of mutation rates have generally been inferred from phenotypic observations or comparisons of homologous sequences among closely related species [1–3]. Here, we apply new sequencing technology to measure directly one mutation rate, that of base substitutions on the human Y chromosome. The Y chromosomes of two individuals separated by 13 generations were flow sorted and sequenced by Illumina (Solexa) paired-end sequencing to an average depth of 11× or 20×, respectively . Candidate mutations were further examined by capillary sequencing in cell-line and blood DNA from the donors and additional family members. Twelve mutations were confirmed in ∼10.15 Mb; eight of these had occurred in vitro and four in vivo. The latter could be placed in different positions on the pedigree and led to a mutation-rate measurement of 3.0 × 10−8 mutations/nucleotide/generation (95% CI: 8.9 × 10−9–7.0 × 10−8), consistent with estimates of 2.3 × 10−8–6.3 × 10−8 mutations/nucleotide/generation for the same Y-chromosomal region from published human-chimpanzee comparisons  depending on the generation and split times assumed.
DNA methylation is a widely studied epigenetic mechanism known to correlate with gene repression and genomic stability. Development of sensitive methods for global detection of DNA methylation events is of particular importance.
We here describe a technique, called modified methylation-specific digital karyotyping (MMSDK) based on methylation-specific digital karyotyping (MSDK) with a novel sequencing approach. Briefly, after a tandem digestion of genomic DNA with a methylation-sensitive mapping enzyme and a fragmenting enzyme, short sequence tags are obtained. These tags are amplified, followed by direct, massively parallel sequencing (Solexa 1G Genome Analyzer). This method allows high-throughput and low-cost genome-wide DNA methylation mapping. We applied this method to investigate global DNA methylation profiles for widely used breast cancer cell lines, MCF-7 and MDA-MB-231, which are representatives for luminal-like and mesenchymal-like cancer types, respectively. By comparison, a highly similar overall DNA methylation pattern was revealed for the two cell lines. However a cohort of individual genomic loci with significantly different DNA methylation status between two cell lines was identified. Furthermore, we revealed a genome-wide significant correlation between gene expression and the methylation status of gene promoters with CpG islands (CGIs) in the two cancer cell lines, and a correlation of gene expression and the methylation status of promoters without CGIs in MCF-7 cells.
The MMSDK method will be a valuable tool to increase the current knowledge of genome wide DNA methylation profiles.
Mycobacterial pathogens are a major threat to humans. With the increasing availability of functional genomic data, research on mycobacterial pathogenesis and subsequent control strategies will be greatly accelerated. It has been suggested that genome polymorphisms, namely large sequence polymorphisms, can influence the pathogenicity of different mycobacterial strains. However, there is currently no database dedicated to mycobacterial genome polymorphisms with functional interpretations.
We have developed a mycobacterial database (MyBASE) housing genome polymorphism data and gene functions to provide the mycobacterial research community with a useful information resource and analysis platform. Whole genome comparison data produced by our lab and the novel genome polymorphisms identified were deposited into MyBASE. Extensive literature review of genome polymorphism data, mainly large sequence polymorphisms (LSPs), operon predictions and curated annotations of virulence and essentiality of mycobacterial genes are unique features of MyBASE. Large-scale genomic data integration from public resources makes MyBASE a comprehensive data warehouse useful for current research. All data is cross-linked and can be graphically viewed via a toolbox in MyBASE.
As an integrated platform focused on the collection of experimental data from our own lab and published literature, MyBASE will facilitate analysis of genome structure and polymorphisms, which will provide insight into genome evolution. Importantly, the database will also facilitate the comparison of virulence factors among various mycobacterial strains. MyBASE is freely accessible via http://mybase.psych.ac.cn.
The YH database is a server that allows the user to easily browse and download data from the first Asian diploid genome. The aim of this platform is to facilitate the study of this Asian genome and to enable improved organization and presentation large-scale personal genome data. Powered by GBrowse, we illustrate here the genome sequences, SNPs, and sequencing reads in the MapView. The relationships between phenotype and genotype can be searched by location, dbSNP ID, HGMD ID, gene symbol and disease name. A BLAST web service is also provided for the purpose of aligning query sequence against YH genome consensus. The YH database is currently one of the three personal genome database, organizing the original data and analysis results in a user-friendly interface, which is an endeavor to achieve fundamental goals for establishing personal medicine. The database is available at http://yh.genomics.org.cn.
In May 2000, the Beijing Institute of Genomics formally announced the launch of a comprehensive crop genome research project on rice genomics, the Chinese Superhybrid Rice Genome Project. SRGP is not simply a sequencing project targeted to a single rice (Oryza sativa L.) genome, but a full-swing research effort with an ultimate goal of providing inclusive basic genomic information and molecular tools not only to understand biology of the rice, both as an important crop species and a model organism of cereals, but also to focus on a popular superhybrid rice landrace, LYP9. We have completed the first phase of SRGP and provide the rice research community with a finished genome sequence of an indica variety, 93-11 (the paternal cultivar of LYP9), together with ample data on subspecific (between subspecies) polymorphisms, transcriptomes and proteomes, useful for within-species comparative studies. In the second phase, we have acquired the genome sequence of the maternal cultivar, PA64S, together with the detailed catalogues of genes uniquely expressed in the parental cultivars and the hybrid as well as allele-specific markers that distinguish parental alleles. Although SRGP in China is not an open-ended research programme, it has been designed to pave a way for future plant genomics research and application, such as to interrogate fundamentals of plant biology, including genome duplication, polyploidy and hybrid vigour, as well as to provide genetic tools for crop breeding and to carry along a social burden—leading a fight against the world's hunger. It began with genomics, the newly developed and industry-scale research field, and from the world's most populous country. In this review, we summarize our scientific goals and noteworthy discoveries that exploit new territories of systematic investigations on basic and applied biology of rice and other major cereal crops.
rice; genome; transcriptome; proteome; hybrid vigour
Cancer is ranked as one of the top killers in all human diseases and continues to have a devastating effect on the population around the globe. Current research efforts are aiming to accelerate our understanding of the molecular basis of cancer and develop effective means for cancer diagnostics, treatment and prognosis. An altered pattern of epigenetic modifications, most importantly DNA methylation events, plays a critical role in tumorigenesis through regulating oncogene activation, tumor suppressor gene silencing and chromosomal instability. To study interplay of DNA methylation, gene expression and cancer, we developed a publicly accessible database for human DNA Methylation and Cancer (MethyCancer, http://methycancer.genomics.org.cn). MethyCancer hosts both highly integrated data of DNA methylation, cancer-related gene, mutation and cancer information from public resources, and the CpG Island (CGI) clones derived from our large-scale sequencing. Interconnections between different data types were analyzed and presented. Furthermore, a powerful search tool is developed to provide user-friendly access to all the data and data connections. A graphical MethyView shows DNA methylation in context of genomics and genetics data facilitating the research in cancer to understand genetic and epigenetic mechanisms that make dramatic changes in gene expression of tumor cells.
A resource consisting of one million porcine ESTs is described, providing an essential resource for annotation, comparative genomics, assembly of the pig genome sequence, and further porcine transcription studies.
Knowledge of the structure of gene expression is essential for mammalian transcriptomics research. We analyzed a collection of more than one million porcine expressed sequence tags (ESTs), of which two-thirds were generated in the Sino-Danish Pig Genome Project and one-third are from public databases. The Sino-Danish ESTs were generated from one normalized and 97 non-normalized cDNA libraries representing 35 different tissues and three developmental stages.
Using the Distiller package, the ESTs were assembled to roughly 48,000 contigs and 73,000 singletons, of which approximately 25% have a high confidence match to UniProt. Approximately 6,000 new porcine gene clusters were identified. Expression analysis based on the non-normalized libraries resulted in the following findings. The distribution of cluster sizes is scaling invariant. Brain and testes are among the tissues with the greatest number of different expressed genes, whereas tissues with more specialized function, such as developing liver, have fewer expressed genes. There are at least 65 high confidence housekeeping gene candidates and 876 cDNA library-specific gene candidates. We identified differential expression of genes between different tissues, in particular brain/spinal cord, and found patterns of correlation between genes that share expression in pairs of libraries. Finally, there was remarkable agreement in expression between specialized tissues according to Gene Ontology categories.
This EST collection, the largest to date in pig, represents an essential resource for annotation, comparative genomics, assembly of the pig genome sequence, and further porcine transcription studies.
Streptococcus suis serotype 2 (SS2) is an important zoonotic pathogen, causing more than 200 cases of severe human infection worldwide, with the hallmarks of meningitis, septicemia, arthritis, etc. Very recently, SS2 has been recognized as an etiological agent for streptococcal toxic shock syndrome (STSS), which was originally associated with Streptococcus pyogenes (GAS) in Streptococci. However, the molecular mechanisms underlying STSS are poorly understood.
Methods and Findings
To elucidate the genetic determinants of STSS caused by SS2, whole genome sequencing of 3 different Chinese SS2 strains was undertaken. Comparative genomics accompanied by several lines of experiments, including experimental animal infection, PCR assay, and expression analysis, were utilized to further dissect a candidate pathogenicity island (PAI). Here we show, for the first time, a novel molecular insight into Chinese isolates of highly invasive SS2, which caused two large-scale human STSS outbreaks in China. A candidate PAI of ∼89 kb in length, which is designated 89K and specific for Chinese SS2 virulent isolates, was investigated at the genomic level. It shares the universal properties of PAIs such as distinct GC content, consistent with its pivotal role in STSS and high virulence.
To our knowledge, this is the first PAI candidate from S. suis worldwide. Our finding thus sheds light on STSS triggered by SS2 at the genomic level, facilitates further understanding of its pathogenesis and points to directions of development on some effective strategies to combat highly pathogenic SS2 infections.
Frequent outbreaks of highly pathogenic avian influenza and the increasing data available for comparative analysis require a central database specialized in influenza viruses (IVs). We have established the Influenza Virus Database (IVDB) to integrate information and create an analysis platform for genetic, genomic, and phylogenetic studies of the virus. IVDB hosts complete genome sequences of influenza A virus generated by Beijing Institute of Genomics (BIG) and curates all other published IV sequences after expert annotation. Our Q-Filter system classifies and ranks all nucleotide sequences into seven categories according to sequence content and integrity. IVDB provides a series of tools and viewers for comparative analysis of the viral genomes, genes, genetic polymorphisms and phylogenetic relationships. A search system has been developed for users to retrieve a combination of different data types by setting search options. To facilitate analysis of global viral transmission and evolution, the IV Sequence Distribution Tool (IVDT) has been developed to display the worldwide geographic distribution of chosen viral genotypes and to couple genomic data with epidemiological data. The BLAST, multiple sequence alignment and phylogenetic analysis tools were integrated for online data analysis. Furthermore, IVDB offers instant access to pre-computed alignments and polymorphisms of IV genes and proteins, and presents the results as SNP distribution plots and minor allele distributions. IVDB is publicly available at
3′-Phosphoadenosine-5′-phosphatase (PAPase) is required for the removal of toxic 3′-phosphoadenosine-5′-phosphate (PAP) produced during sulfur assimilation in various eukaryotic organisms. This enzyme is a well-known target of lithium and sodium toxicity and has been used for the production of salt-resistant transgenic plants. In addition, PAPase has also been proposed as a target in the treatment of manic-depressive patients. One gene, halA, which could encode a protein closely related to the PAPases of yeasts and plants, was identified from the cyanobacterium Arthrospira (Spirulina) platensis. Phylogenic analysis indicated that proteins related to PAPases from several cyanobacteria were found in different clades, suggesting multiple origins of PAPases in cyanobacteria. The HalA polypeptide from A. platensis was overproduced in Escherichia coli and used for the characterization of its biochemical properties. HalA was dependent on Mg2+ for its activity and could use PAP or 3′-phosphoadenosine-5′-phosphosulfate as a substrate. HalA is sensitive to Li+ (50% inhibitory concentration [IC50] = 3.6 mM) but only slightly sensitive to Na+ (IC50 = 600 mM). The salt sensitivity of HalA was thus different from that of most of its eukaryotic counterparts, which are much more sensitive to both Li+ and Na+, but was comparable to the PAPase AtAHL (Hal2p-like protein) from Arabidopsis thaliana. The properties of HalA could help us to understand the structure-function relationship underlying the salt sensitivity of PAPases. The expression of halA improved the Li+ tolerance of E. coli, suggesting that the sulfur-assimilating pathway is a likely target of salt toxicity in bacteria as well.
We describe an algorithm, ReAS, to recover ancestral sequences for transposable elements (TEs) from the unassembled reads of a whole genome shotgun. The main assumptions are that these TEs must exist at high copy numbers across the genome and must not be so old that they are no longer recognizable in comparison to their ancestral sequences. Tested on the japonica rice genome, ReAS was able to reconstruct all of the high copy sequences in the Repbase repository of known TEs, and increase the effectiveness of RepeatMasker in identifying TEs from genome sequences.
Transposable elements (TEs) are a major component of the genomes of multicellular organisms. They are parasitic creatures that invade the genome, insert multiple copies of themselves, and then die. All we see now are the decayed remnants of their ancestral sequences. Reconstruction of these ancestral sequences can bring dead TEs back to life. Algorithms for detecting TEs compare present-day sequences to a library of ancestral sequences. Unknown to many, pervasive use of whole genome shotgun (WGS) methods in large-scale sequencing have made TE reconstructions increasingly problematic. To minimize assembly errors, WGS methods must reject the highly repetitive sequences that characterize most TEs, especially the most recent TEs, which are the least diverged from their ancestral sequences (and most informative for reconstruction). This is acceptable to many, because the most important parts of the genes are not repetitive, but for the TE aficionados, it is a problem. ReAS is a novel algorithm that does TE reconstruction using only the unassembled reads of a WGS. Tested against the WGS for japonica rice, it is shown to produce a library that is superior to the manually curated Repbase database of known ancestral TEs.