Cyclin-dependent kinases (Cdks) are key regulators of the cell division cycle. Pho85 is a multifunctional Cdk in budding yeast involved in aspects of metabolism, the cell cycle, cell polarity, and gene expression. Consistent with a broad spectrum of functions, Pho85 associates with a family of 10 cyclins and deletion of PHO85 causes a pleiotropic phenotype. Discovering the physiological substrates of protein kinases is a major challenge, and we have pursued a number of genomics approaches to reveal the processes regulated by Pho85 and to understand the root cause of reduced cellular fitness in pho85Δ mutant strains. We used a functional-genomics approach called synthetic genetic array (SGA) analysis to systematically identify strain backgrounds in which PHO85 is required for viability. In parallel, we used DNA microarrays to examine the genome-wide transcriptional consequences of deleting PHO85 or members of the Pho85 cyclin family. Using this pairwise approach coupled with phenotypic tests, we uncovered clear roles for Pho85 in cell integrity and the response to adverse growth conditions. Importantly, our combined approach allowed us to ascribe new aspects of the complex pho85 phenotype to particular cyclins; our data highlight a cell integrity function for the Pcl1,2 subgroup of Pho85 Cdks that is independent of a role for the Pho80-Pho85 kinase in the response to stress. Using a modification of the SGA technique to screen for suppressors of pho85Δ strain growth defects, we found that deletion of putative vacuole protein gene VTC4 suppressed the sensitivity of the pho85Δ strain to elevated CaCl2 and many other stress conditions. Expression of VTC4 is regulated by Pho4, a transcription factor that is inhibited by the Pho80-Pho85 kinase. Genetic tests and electron microscopy experiments suggest that VTC4 is a key target of Pho4 and that Pho80-Pho85-mediated regulation of VTC4 expression is required for proper vacuole function and for yeast cell survival under a variety of suboptimal conditions. The integration of multiple genomics approaches is likely to be a generally useful strategy for extracting functional information from pleiotropic mutant phenotypes.
Genetic interactions are highly informative for deciphering the underlying functional principles that govern how genes control cell processes. Recent developments in Synthetic Genetic Array (SGA) analysis enable the mapping of quantitative genetic interactions on a genome-wide scale. To facilitate access to this resource, which will ultimately represent a complete genetic interaction network for a eukaryotic cell, we developed DRYGIN (Data Repository of Yeast Genetic Interactions)—a web database system that aims at providing a central platform for yeast genetic network analysis and visualization. In addition to providing an interface for searching the SGA genetic interactions, DRYGIN also integrates other data sources, in order to associate the genetic interactions with pathway information, protein complexes, other binary genetic and physical interactions, and Gene Ontology functional annotation. DRYGIN version 1.0 currently holds more than 5.4 million measurements of genetic interacting pairs involving ∼4500 genes, and is available at http://drygin.ccbr.utoronto.ca
Hundreds of genomes have been successfully sequenced to date, and the data are publicly available. At the same time, the advances in large-scale expression and purification of recombinant proteins have paved the way for structural genomics efforts. Frequently, however, little is known about newly expressed proteins calling for large-scale protein characterization to better understand their biochemical roles and to enable structure–function relationship studies. In the Structural Genomics Consortium (SGC), we have established a platform to characterize large numbers of purified proteins. This includes screening for ligands, enzyme assays, peptide arrays and peptide displacement in a 384-well format. In this review, we describe this platform in more detail and report on how our approach significantly increases the success rate for structure determination. Coupled with high-resolution X-ray crystallography and structure-guided methods, this platform can also be used toward the development of chemical probes through screening families of proteins against a variety of chemical series and focused chemical libraries.
Thermodenaturation; Protein stabilization; Ligand binding; Peptide array; Chemical probes
Barley (Hordeum vulgare), first domesticated in the Near East, is a well-studied crop in terms of genetics, genomics, and breeding and qualifies as a model plant for Triticeae research. Recent advances made in barley genomics mainly include the following: (i) rapid accumulation of EST sequence data, (ii) growing number of studies on transcriptome, proteome, and metabolome, (iii) new modeling techniques, (iv) availability of genome-wide knockout collections as well as efficient transformation techniques, and (v) the recently started genome sequencing effort. These developments pave the way for a comprehensive functional analysis and understanding of gene expression networks linked to agronomically important traits. Here, we selectively review important technological developments in barley genomics and related fields and discuss the relevance for understanding genotype-phenotype relationships by using approaches such as genetical genomics and association studies. High-throughput genotyping platforms that have recently become available will allow the construction of high-density genetic maps that will further promote marker-assisted selection as well as physical map construction. Systems biology approaches will further enhance our knowledge and largely increase our abilities to design refined breeding strategies on the basis of detailed molecular physiological knowledge.
Recent technological advances in microarray technology have allowed whole-genome association studies to become possible and greatly expanded the ability to map complex genetic traits across the human genome. Current mapping array technologies offer unprecedented resolution to examine single-nucleotide polymorphisms (SNP) across the human genome. The resolution of these mapping arrays can be greatly increased by performing secondary screens using a targeted genotyping approach to further characterize SNPs discovered with the mapping arrays as well as include SNPs that are determined to be of interest in candidate genes or other genetic areas of importance.
Over the last several months, we have examined 2200 patient samples from the Shanghai Breast Cancer Study cohort using a combination of Affymetrix 500k mapping arrays and a custom 1700 SNP targeted genotyping array. Beginning with the analysis of 200 samples analyzed with the 500k mapping arrays and proceeding to 2200 samples analyzed using the custom targeted genotyping arrays, data regarding the automation of assay protocols and data analysis schemes to greatly increase the efficiency and speed of the analysis will be presented. Association statistics as well as chromosome copy number information has been produced. Data will be presented to illustrate the importance of automation, accurate sample tracking, and examination of existing data from the HapMap project in evaluating and analyzing the resulting genotyping datasets.
Synthetic genetic array (SGA) analysis automates yeast genetics, enabling high-throughput construction of ordered arrays of double mutants. Quantitative colony sizes derived from SGA analysis can be used to measure cellular fitness and score for genetic interactions, such as synthetic lethality. Here we show that SGA colony sizes also can be used to obtain global maps of meiotic recombination because recombination frequency affects double-mutant formation for gene pairs located on the same chromosome and therefore influences the size of the resultant double-mutant colony. We obtained quantitative colony size data for ~1.2 million double mutants located on the same chromosome and constructed a genome-scale genetic linkage map at ~5 kb resolution. We found that our linkage map is reproducible and consistent with previous global studies of meiotic recombination. In particular, we confirmed that the total number of crossovers per chromosome tends to follow a simple linear model that depends on chromosome size. In addition, we observed a previously unappreciated relationship between the size of linkage regions surrounding each centromere and chromosome size, suggesting that crossovers tend to occur farther away from the centromere on larger chromosomes. The pericentric regions of larger chromosomes also appeared to load larger clusters of meiotic cohesin Rec8, and acquire fewer Spo11-catalyzed DNA double-strand breaks. Given that crossovers too near or too far from centromeres are detrimental to homolog disjunction and increase the incidence of aneuploidy, our data suggest that chromosome size may have a direct role in regulating the fidelity of chromosome segregation during meiosis.
synthetic genetic array (SGA); genomics; meiosis; recombination; centromere; genetic linkage; chromosome size; double strand breaks; Rec8; Spo11; yeast; Saccharomyces cerevisiae
Conditional temperature-sensitive (ts) mutations are valuable reagents for studying essential genes in the yeast Saccharomyces cerevisiae. We constructed 787 ts strains, covering 497 (~45%) of the 1,101 essential yeast genes, with ~30% of the genes represented by multiple alleles. All of the alleles are integrated into their native genomic locus in the S288C common reference strain and are linked to a kanMX selectable marker, allowing further genetic manipulation by synthetic genetic array (SGA)–based, high-throughput methods. We show two such manipulations: barcoding of 440 strains, which enables chemical-genetic suppression analysis, and the construction of arrays of strains carrying different fluorescent markers of subcellular structure, which enables quantitative analysis of phenotypes using high-content screening. Quantitative analysis of a GFP-tubulin marker identified roles for cohesin and condensin genes in spindle disassembly. This mutant collection should facilitate a wide range of systematic studies aimed at understanding the functions of essential genes.
A decade after the human genome sequence, most vertebrate gene functions remain poorly understood, limiting benefits to human health from rapidly advancing genomic technologies. Systematic in vivo functional analysis is ideally suited to the experimentally accessible Xenopus embryo, which combines embryological accessibility with a broad range of transgenic, biochemical and gain-of-function assays. The diploid X. tropicalis adds loss-of-function genetics and enhanced genomics to this repertoire. In the last decade diverse phenotypes have been recovered from genetic screens, mutations have been cloned, and reverse genetics in the form of TILLING and targeted gene editing have been established. Simple haploid genetics and gynogenesis and the very large number of embryos produced streamline screening and mapping. Improved genomic resources and the revolution in high-throughput sequencing are transforming mutation cloning and reverse genetic approaches. The combination of loss-of-function mutant backgrounds with the diverse array of conventional Xenopus assays offers a uniquely flexible platform for analysis of gene function in vertebrate development.
amphibian; early development; genetics; organogenesis; Xenopus; tropicalis
Plant breeding has been very successful in developing improved varieties using conventional tools and methodologies. Nowadays, the availability of genomic tools and resources is leading to a new revolution of plant breeding, as they facilitate the study of the genotype and its relationship with the phenotype, in particular for complex traits. Next Generation Sequencing (NGS) technologies are allowing the mass sequencing of genomes and transcriptomes, which is producing a vast array of genomic information. The analysis of NGS data by means of bioinformatics developments allows discovering new genes and regulatory sequences and their positions, and makes available large collections of molecular markers. Genome-wide expression studies provide breeders with an understanding of the molecular basis of complex traits. Genomic approaches include TILLING and EcoTILLING, which make possible to screen mutant and germplasm collections for allelic variants in target genes. Re-sequencing of genomes is very useful for the genome-wide discovery of markers amenable for high-throughput genotyping platforms, like SSRs and SNPs, or the construction of high density genetic maps. All these tools and resources facilitate studying the genetic diversity, which is important for germplasm management, enhancement and use. Also, they allow the identification of markers linked to genes and QTLs, using a diversity of techniques like bulked segregant analysis (BSA), fine genetic mapping, or association mapping. These new markers are used for marker assisted selection, including marker assisted backcross selection, ‘breeding by design’, or new strategies, like genomic selection. In conclusion, advances in genomics are providing breeders with new tools and methodologies that allow a great leap forward in plant breeding, including the ‘superdomestication’ of crops and the genetic dissection and breeding for complex traits.
Bioinformatics; complex traits; genetic maps; marker assisted selection; molecular markers; next-generation-sequencing; quantitative trait loci.
Transfected cell arrays (TCAs) represent a high-throughput technique to correlate gene expression with functional cell responses. Despite advances in TCAs, improvements are needed for the widespread application of this technology. We have developed a TCA that combines a two-plasmid system and dual-bioluminescence imaging to quantitatively normalize for variability in transfection and increase sensitivity. The two-plasmids consist of: (i) normalization plasmid present within each spot, and (ii) functional plasmid that varies between spots, responsible for the functional endpoint of the array. Bioluminescence imaging of dual-luciferase reporters (renilla, firefly luciferase) provides sensitive and quantitative detection of cellular response, with minimal post-transfection processing. The array was applied to quantify estrogen receptor α (ERα) activity in MCF-7 breast cancer cells. A plasmid containing an ERα-regulated promoter directing firefly luciferase expression was mixed with a normalization plasmid, complexed with cationic lipids and deposited into an array. ER induction mimicked results obtained through traditional assays methods, with estrogen inducing luciferase expression 10-fold over the antiestrogen fulvestrant or vehicle. Furthermore, the array captured a dose response to estrogen, demonstrating the sensitivity of bioluminescence quantification. This system provides a tool for basic science research, with potential application for the development of patient specific therapies.
transfected cell array; bioluminescence imaging; substrate-mediated gene delivery; estrogen receptor; breast cancer
Potato, a highly heterozygous tetraploid, is undergoing an exciting phase of genomics resource development. The potato research community has established extensive genomic resources, such as large expressed sequence tag (EST) data collections, microarrays and other expression profiling platforms, and large-insert genomic libraries. Moreover, potato will now benefit from a global potato physical mapping effort, which is serving as the underlying resource for a full potato genome sequencing project, now well underway. These tools and resources are having a major impact on potato breeding and genetics. The genome sequence will provide an invaluable comparative genomics resource for cross-referencing to the other Solanaceae, notably tomato, whose sequence is also being determined. Most importantly perhaps, a potato genome sequence will pave the way for the functional analysis of the large numbers of potato genes that await discovery. Potato, being easily transformable, is highly amenable to the investigation of gene function by biotechnological approaches. Recent advances in the development of Virus Induced Gene Silencing (VIGS) and related methods will facilitate rapid progress in the analysis of gene function in this important crop.
While the term flow cytometry refers to the measurement of cells, the approach of making sensitive multiparameter optical measurements in a flowing sample stream is a very general analytical approach. The past few years have seen an explosion in the application of flow cytometry technology for molecular analysis and measurements using micro-particles as solid supports. While microsphere-based molecular analyses using flow cytometry date back three decades, the need for highly parallel quantitative molecular measurements that has arisen from various genomic and proteomic advances has driven the development in particle encoding technology to enable highly multiplexed assays. Multiplexed particle-based immunoassays are now common place, and new assays to study genes, protein function, and molecular assembly. Numerous efforts are underway to extend the multiplexing capabilities of microparticle-based assays through new approaches to particle encoding and analyte reporting. The impact of these developments will be seen in the basic research and clinical laboratories, as well as in drug development.
microarray; systems biology; proteomics; protein array; high throughput screening; drug discovery; diagnostics
For the past decade, the development of genomic technology has revolutionized modern biological research and drug discovery. Functional genomic analyses enable biologists to perform analysis of genetic events on a global scale and they have been widely used in gene discovery, biomarker determination, disease classification, and drug target identification. In this article, we provide an overview of the current and emerging tools involved in genomic studies, including expression arrays, microRNA arrays, array CGH, ChIP-on-chip, methylation arrays, mutation analysis, genome wide-association studies, proteomic analysis, integrated functional genomic analysis and related bioinformatic and biostatistical analyses. Using human liver cancer as an example, we provide further information of how these genomic approaches can be applied in cancer research.
Functional genomics; arrays; cancer
Recent advances in genomic and post-genomic technologies have facilitated a genome-wide analysis of the insecticide resistance-associated genes in insects. Through bed bug, Cimex lectularius transcriptome analysis, we identified 14 molecular markers associated with pyrethroid resistance. Our studies revealed that most of the resistance-associated genes functioning in diverse mechanisms are expressed in the epidermal layer of the integument, which could prevent or slow down the toxin from reaching the target sites on nerve cells, where an additional layer of resistance (kdr) is possible. This strategy evolved in bed bugs is based on their unique morphological, physiological and behavioral characteristics and has not been reported in any other insect species. RNA interference-aided knockdown of resistance associated genes showed the relative contribution of each mechanism towards overall resistance development. Understanding the complexity of adaptive strategies employed by bed bugs will help in designing the most effective and sustainable bed bug control methods.
Tra1 is an essential 437-kDa component of the Saccharomyces cerevisiae SAGA/SLIK and NuA4 histone acetyltransferase complexes. It is a member of a group of key signaling molecules that share a carboxyl-terminal domain related to phosphatidylinositol-3-kinase but unlike many family members, it lacks kinase activity. To identify genetic interactions for TRA1 and provide insight into its function we have performed a systematic genetic array analysis (SGA) on tra1SRR3413, an allele that is defective in transcriptional regulation.
The SGA analysis revealed 114 synthetic slow growth/lethal (SSL) interactions for tra1SRR3413. The interacting genes are involved in a range of cellular processes including gene expression, mitochondrial function, and membrane sorting/protein trafficking. In addition many of the genes have roles in the cellular response to stress. A hierarchal cluster analysis revealed that the pattern of SSL interactions for tra1SRR3413 most closely resembles deletions of a group of regulatory GTPases required for membrane sorting/protein trafficking. Consistent with a role for Tra1 in cellular stress, the tra1SRR3413 strain was sensitive to rapamycin. In addition, calcofluor white sensitivity of the strain was enhanced by the protein kinase inhibitor staurosporine, a phenotype shared with the Ada components of the SAGA/SLIK complex. Through analysis of a GFP-Tra1 fusion we show that Tra1 is principally localized to the nucleus.
We have demonstrated a genetic association of Tra1 with nuclear, mitochondrial and membrane processes. The identity of the SSL genes also connects Tra1 with cellular stress, a result confirmed by the sensitivity of the tra1SRR3413 strain to a variety of stress conditions. Based upon the nuclear localization of GFP-Tra1 and the finding that deletion of the Ada components of the SAGA complex result in similar phenotypes as tra1SRR3413, we suggest that the effects of tra1SRR3413 are mediated, at least in part, through its role in the SAGA complex.
DNA methylation is one of the most important heritable epigenetic modifications of the genome and is involved in the regulation of many cellular processes. Aberrant DNA methylation has been frequently reported to influence gene expression and subsequently cause various human diseases, including cancer. Recent rapid advances in next-generation sequencing technologies have enabled investigators to profile genome methylation patterns at singlebase resolution. Remarkably, more than 20 eukaryotic methylomes have been generated thus far, with a majority published since November 2009. Analysis of this vast amount of data has dramatically enriched our knowledge of biological function, conservation and divergence of DNA methylation in eukaryotes. Even so, many specific functions of DNA methylation and their underlying regulatory systems still remain unknown to us. Here, we briefly introduce current approaches for DNA methylation profiling and then systematically review the features of whole genome DNA methylation patterns in eight animals, six plants and five fungi. Our systematic comparison provides new insights into the conservation and divergence of DNA methylation in eukaryotes and their regulation of gene expression. This work aims to summarize the current state of available methylome data and features informatively.
DNA methylation; methylome; single-base resolution; CpG; gene body; broadness; deepness; promoter
Azoxymethane (AOM) or 1,2-dimethylhydrazine (DMH)-induced colon carcinogenesis in rats shares many phenotypical similarities with human sporadic colon cancer and is a reliable model for identifying chemopreventive agents. Genetic mutations relevant to human colon cancer have been described in this model, but comprehensive gene expression and genomic analysis have not been reported so far. Therefore, we applied genome-wide technologies to study variations in gene expression and genomic alterations in DMH-induced colon cancer in F344 rats.
For gene expression analysis, 9 tumours (TUM) and their paired normal mucosa (NM) were hybridized on 4 × 44K Whole rat arrays (Agilent) and selected genes were validated by semi-quantitative RT-PCR. Functional analysis on microarray data was performed by GenMAPP/MappFinder analysis. Array-comparative genomic hybridization (a-CGH) was performed on 10 paired TUM-NM samples hybridized on Rat genome arrays 2 × 105K (Agilent) and the results were analyzed by CGH Analytics (Agilent).
Microarray gene expression analysis showed that Defcr4, Igfbp5, Mmp7, Nos2, S100A8 and S100A9 were among the most up-regulated genes in tumours (Fold Change (FC) compared with NM: 183, 48, 39, 38, 36 and 32, respectively), while Slc26a3, Mptx, Retlna and Muc2 were strongly down-regulated (FC: -500; -376, -167, -79, respectively). Functional analysis showed that pathways controlling cell cycle, protein synthesis, matrix metalloproteinases, TNFα/NFkB, and inflammatory responses were up-regulated in tumours, while Krebs cycle, the electron transport chain, and fatty acid beta oxidation were down-regulated. a-CGH analysis showed that four TUM out of ten had one or two chromosomal aberrations. Importantly, one sample showed a deletion on chromosome 18 including Apc.
The results showed complex gene expression alterations in adenocarcinomas encompassing many altered pathways. While a-CGH analysis showed a low degree of genomic imbalance, it is interesting to note that one of the alterations concerned Apc, a key gene in colorectal carcinogenesis. The fact that many of the molecular alterations described in this study are documented in human colon tumours confirms the relevance of DMH-induced cancers as a powerful tool for the study of colon carcinogenesis and chemoprevention.
Large-scale quantitative analysis of transcriptional co-expression has been used to dissect regulatory networks and to predict the functions of new genes discovered by genome sequencing in model organisms such as yeast. Although the idea that tissue-specific expression is indicative of gene function in mammals is widely accepted, it has not been objectively tested nor compared with the related but distinct strategy of correlating gene co-expression as a means to predict gene function.
We generated microarray expression data for nearly 40,000 known and predicted mRNAs in 55 mouse tissues, using custom-built oligonucleotide arrays. We show that quantitative transcriptional co-expression is a powerful predictor of gene function. Hundreds of functional categories, as defined by Gene Ontology 'Biological Processes', are associated with characteristic expression patterns across all tissues, including categories that bear no overt relationship to the tissue of origin. In contrast, simple tissue-specific restriction of expression is a poor predictor of which genes are in which functional categories. As an example, the highly conserved mouse gene PWP1 is widely expressed across different tissues but is co-expressed with many RNA-processing genes; we show that the uncharacterized yeast homolog of PWP1 is required for rRNA biogenesis.
We conclude that 'functional genomics' strategies based on quantitative transcriptional co-expression will be as fruitful in mammals as they have been in simpler organisms, and that transcriptional control of mammalian physiology is more modular than is generally appreciated. Our data and analyses provide a public resource for mammalian functional genomics.
Gene tagging facilitates systematic genomic and proteomic analyses but chromosomal tagging typically disrupts gene regulatory sequences. Here we describe a seamless gene tagging approach that preserves endogenous gene regulation and is potentially applicable in any species with efficient DNA double-strand break repair by homologous recombination. We implement seamless tagging in Saccharomyces cerevisiae and demonstrate its application for protein tagging while preserving simultaneously upstream and downstream gene regulatory elements. Seamless tagging is compatible with high-throughput strain construction using synthetic genetic arrays (SGA), enables functional analysis of transcription antisense to open reading frames and should facilitate systematic and minimally-invasive analysis of gene functions.
Global quantitative analysis of genetic interactions is a powerful approach for deciphering the roles of genes and mapping functional relationships among pathways. Using colony size as a proxy for fitness, we developed a method for measuring fitness-based genetic interactions from high-density arrays of yeast double mutants generated by synthetic genetic array (SGA) analysis. We identified several experimental sources of systematic variation and developed normalization strategies to obtain accurate single- and double-mutant fitness measurements, which rival the accuracy of other high-resolution studies. We applied the SGA score to examine the relationship between physical and genetic interaction networks, and we found that positive genetic interactions connect across functionally distinct protein complexes revealing a network of genetic suppression among loss-of-function alleles.
DNA arrays and chips are powerful new tools for gene expression profiling. Current arrays contain hundreds or thousands of probes and large scale sequencing and screening projects will likely lead to the creation of global genomic arrays. DNA arrays and chips will be key in understanding how genes respond to specific changes of environment and will also greatly assist in drug discovery and molecular diagnostics. To facilitate widespread realization of the quantitative potential of this approach, we have designed procedures and software which facilitate analysis of autoradiography films with accuracy comparable to phosphorimaging devices. Algorithms designed for analysis of DNA array autoradiographs incorporate 3-D peak fitting of features on films and estimation of local backgrounds. This software has a flexible grid geometry and can be applied to different types of DNA arrays, including custom arrays.
The use of microarray technology to measure gene expression on a genome-wide scale has been well established for more than a decade. Methods to process and analyse the vast quantity of expression data generated by a typical microarray experiment are similarly well-established. The Affymetrix Exon 1.0 ST array is a relatively new type of array, which has the capability to assess expression at the individual exon level. This allows a more comprehensive analysis of the transcriptome, and in particular enables the study of alternative splicing, a gene regulation mechanism important in both normal conditions and in diseases. Some aspects of exon array data analysis are shared with those for standard gene expression data but others present new challenges that have required development of novel tools. Here, I will introduce the exon array and present a detailed example tutorial for analysis of data generated using this platform.
exon array; affymetrix; alternative splicing; microarray; gene expression
Functional genomics has emerged over the past ten years as a novel technology to study genetic alterations. Gene expression arrays are one genomic technique employed to discover changes in the DNA expression that occur in neoplastic transformation. Microarrays have been applied to investigating lung cancer. Specific applications include discovering novel genetic changes that occur in lung tumors. Microarrays can also be applied to improve diagnosis, staging, and discover prognostic markers. The eventual goal of this technology is to discover new markers for therapy and to customize therapy based on an individual tumor genetic composition. In this review, we present the current state of gene expression array technology in its application to lung cancer.
Lung cancer; genomics; gene expression arrays; gene expression profiling; diagnosis; staging; prognosis; treatment; therapy; management
Profound changes are occurring in the strategies that biotechnology-based industries are deploying in the search for exploitable biology and to discover new products and develop new or improved processes. The advances that have been made in the past decade in areas such as combinatorial chemistry, combinatorial biosynthesis, metabolic pathway engineering, gene shuffling, and directed evolution of proteins have caused some companies to consider withdrawing from natural product screening. In this review we examine the paradigm shift from traditional biology to bioinformatics that is revolutionizing exploitable biology. We conclude that the reinvigorated means of detecting novel organisms, novel chemical structures, and novel biocatalytic activities will ensure that natural products will continue to be a primary resource for biotechnology. The paradigm shift has been driven by a convergence of complementary technologies, exemplified by DNA sequencing and amplification, genome sequencing and annotation, proteome analysis, and phenotypic inventorying, resulting in the establishment of huge databases that can be mined in order to generate useful knowledge such as the identity and characterization of organisms and the identity of biotechnology targets. Concurrently there have been major advances in understanding the extent of microbial diversity, how uncultured organisms might be grown, and how expression of the metabolic potential of microorganisms can be maximized. The integration of information from complementary databases presents a significant challenge. Such integration should facilitate answers to complex questions involving sequence, biochemical, physiological, taxonomic, and ecological information of the sort posed in exploitable biology. The paradigm shift which we discuss is not absolute in the sense that it will replace established microbiology; rather, it reinforces our view that innovative microbiology is essential for releasing the potential of microbial diversity for biotechnology penetration throughout industry. Various of these issues are considered with reference to deep-sea microbiology and biotechnology.
Recent advances in technologies for observing high-resolution genomic activities, such as whole-genome tiling arrays and high-throughput sequencers, provide detailed information for understanding genome functions. However, the functions of 50% of known Arabidopsis thaliana genes remain unknown or are annotated only on the basis of static analyses such as protein motifs or similarities. In this paper, we describe dynamic structure-based dynamic expression (DSDE) analysis, which sequentially predicts both structural and functional features of transcripts. We show that DSDE analysis inferred gene functions 12% more precisely than static structure-based dynamic expression (SSDE) analysis or conventional co-expression analysis based on previously determined gene structures of A. thaliana. This result suggests that more precise structural information than the fixed conventional annotated structures is crucial for co-expression analysis in systems biology of transcriptional regulation and dynamics. Our DSDE method, ARabidopsis Tiling-Array-based Detection of Exons version 2 and over-representation analysis (ARTADE2-ORA), precisely predicts each gene structure by combining two statistical analyses: a probe-wise co-expression analysis of multiple transcriptome measurements and a Markov model analysis of genome sequences. ARTADE2-ORA successfully identified the true functions of about 90% of functionally annotated genes, inferred the functions of 98% of functionally unknown genes and predicted 1,489 new gene structures and functions. We developed a database ARTADE2DB that integrates not only the information predicted by ARTADE2-ORA but also annotations and other functional information, such as phenotypes and literature citations, and is expected to contribute to the study of the functional genomics of A. thaliana. URL: http://artade.org.
Arabidopsis thaliana; Database; Function prediction; Genome tiling array; Unknown genes