Search tips
Search criteria

Results 1-25 (69)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  Genome dynamics of the human embryonic kidney 293 lineage in response to cell biology manipulations 
Nature Communications  2014;5:4767.
The HEK293 human cell lineage is widely used in cell biology and biotechnology. Here we use whole-genome resequencing of six 293 cell lines to study the dynamics of this aneuploid genome in response to the manipulations used to generate common 293 cell derivatives, such as transformation and stable clone generation (293T); suspension growth adaptation (293S); and cytotoxic lectin selection (293SG). Remarkably, we observe that copy number alteration detection could identify the genomic region that enabled cell survival under selective conditions (i.c. ricin selection). Furthermore, we present methods to detect human/vector genome breakpoints and a user-friendly visualization tool for the 293 genome data. We also establish that the genome structure composition is in steady state for most of these cell lines when standard cell culturing conditions are used. This resource enables novel and more informed studies with 293 cells, and we will distribute the sequenced cell lines to this effect.
The human embryonic kidney 293 (HEK293) cell lineage is widely used in cell biology and biotechnology. Here, the authors apply whole genome resequencing methods to characterise genomic variation in six HEK293 cell lines and suggest that this variation could affect experiments using these cell lines.
PMCID: PMC4166678  PMID: 25182477
2.  Comparative in silico analysis of EST-SSRs in angiosperm and gymnosperm tree genera 
BMC Plant Biology  2014;14(1):220.
Simple Sequence Repeats (SSRs) derived from Expressed Sequence Tags (ESTs) belong to the expressed fraction of the genome and are important for gene regulation, recombination, DNA replication, cell cycle and mismatch repair. Here, we present a comparative analysis of the SSR motif distribution in the 5′UTR, ORF and 3′UTR fractions of ESTs across selected genera of woody trees representing gymnosperms (17 species from seven genera) and angiosperms (40 species from eight genera).
Our analysis supports a modest contribution of EST-SSR length to genome size in gymnosperms, while EST-SSR density was not associated with genome size in neither angiosperms nor gymnosperms. Multiple factors seem to have contributed to the lower abundance of EST-SSRs in gymnosperms that has resulted in a non-linear relationship with genome size diversity. The AG/CT motif was found to be the most abundant in SSRs of both angiosperms and gymnosperms, with a relative increase in AT/AT in the latter. Our data also reveals a higher abundance of hexamers across the gymnosperm genera.
Our analysis provides the foundation for future comparative studies at the species level to unravel the evolutionary processes that control the SSR genesis and divergence between angiosperm and gymnosperm tree species.
Electronic supplementary material
The online version of this article (doi:10.1186/s12870-014-0220-8) contains supplementary material, which is available to authorized users.
PMCID: PMC4160553  PMID: 25143005
Angiosperms; Gymnosperms; Expressed sequence tags; Simple sequence repeats (SSR); Microsatellites
3.  Tangled up in two: a burst of genome duplications at the end of the Cretaceous and the consequences for plant evolution 
Genome sequencing has demonstrated that besides frequent small-scale duplications, large-scale duplication events such as whole genome duplications (WGDs) are found on many branches of the evolutionary tree of life. Especially in the plant lineage, there is evidence for recurrent WGDs, and the ancestor of all angiosperms was in fact most likely a polyploid species. The number of WGDs found in sequenced plant genomes allows us to investigate questions about the roles of WGDs that were hitherto impossible to address. An intriguing observation is that many plant WGDs seem associated with periods of increased environmental stress and/or fluctuations, a trend that is evident for both present-day polyploids and palaeopolyploids formed around the Cretaceous–Palaeogene (K–Pg) extinction at 66 Ma. Here, we revisit the WGDs in plants that mark the K–Pg boundary, and discuss some specific examples of biological innovations and/or diversifications that may be linked to these WGDs. We review evidence for the processes that could have contributed to increased polyploid establishment at the K–Pg boundary, and discuss the implications on subsequent plant evolution in the Cenozoic.
PMCID: PMC4071526  PMID: 24958926
whole genome duplication; K–Pg boundary; extinction event; innovation; speciation; plant evolution
4.  The Mycobacterium tuberculosis regulatory network and hypoxia 
Nature  2013;499(7457):178-183.
We have taken the first steps towards a complete reconstruction of the Mycobacterium tuberculosis regulatory network based on ChIP-Seq and combined this reconstruction with system-wide profiling of messenger RNAs, proteins, metabolites and lipids during hypoxia and re-aeration. Adaptations to hypoxia are thought to have a prominent role in M. tuberculosis pathogenesis. Using ChIP-Seq combined with expression data from the induction of the same factors, we have reconstructed a draft regulatory network based on 50 transcription factors. This network model revealed a direct interconnection between the hypoxic response, lipid catabolism, lipid anabolism and the production of cell wall lipids. As a validation of this model, in response to oxygen availability we observe substantial alterations in lipid content and changes in gene expression and metabolites in corresponding metabolic pathways. The regulatory network reveals transcription factors underlying these changes, allows us to computationally predict expression changes, and indicates that Rv0081 is a regulatory hub.
PMCID: PMC4087036  PMID: 23823726
5.  Improving the Adaptability of Simulated Evolutionary Swarm Robots in Dynamically Changing Environments 
PLoS ONE  2014;9(3):e90695.
One of the important challenges in the field of evolutionary robotics is the development of systems that can adapt to a changing environment. However, the ability to adapt to unknown and fluctuating environments is not straightforward. Here, we explore the adaptive potential of simulated swarm robots that contain a genomic encoding of a bio-inspired gene regulatory network (GRN). An artificial genome is combined with a flexible agent-based system, representing the activated part of the regulatory network that transduces environmental cues into phenotypic behaviour. Using an artificial life simulation framework that mimics a dynamically changing environment, we show that separating the static from the conditionally active part of the network contributes to a better adaptive behaviour. Furthermore, in contrast with most hitherto developed ANN-based systems that need to re-optimize their complete controller network from scratch each time they are subjected to novel conditions, our system uses its genome to store GRNs whose performance was optimized under a particular environmental condition for a sufficiently long time. When subjected to a new environment, the previous condition-specific GRN might become inactivated, but remains present. This ability to store ‘good behaviour’ and to disconnect it from the novel rewiring that is essential under a new condition allows faster re-adaptation if any of the previously observed environmental conditions is reencountered. As we show here, applying these evolutionary-based principles leads to accelerated and improved adaptive evolution in a non-stable environment.
PMCID: PMC3944896  PMID: 24599485
6.  TRAPID: an efficient online tool for the functional and comparative analysis of de novo RNA-Seq transcriptomes 
Genome Biology  2013;14(12):R134.
Transcriptome analysis through next-generation sequencing technologies allows the generation of detailed gene catalogs for non-model species, at the cost of new challenges with regards to computational requirements and bioinformatics expertise. Here, we present TRAPID, an online tool for the fast and efficient processing of assembled RNA-Seq transcriptome data, developed to mitigate these challenges. TRAPID offers high-throughput open reading frame detection, frameshift correction and includes a functional, comparative and phylogenetic toolbox, making use of 175 reference proteomes. Benchmarking and comparison against state-of-the-art transcript analysis tools reveals the efficiency and unique features of the TRAPID system. TRAPID is freely available at
PMCID: PMC4053847  PMID: 24330842
7.  The Complex Intron Landscape and Massive Intron Invasion in a Picoeukaryote Provides Insights into Intron Evolution 
Genome Biology and Evolution  2013;5(12):2393-2401.
Genes in pieces and spliceosomal introns are a landmark of eukaryotes, with intron invasion usually assumed to have happened early on in evolution. Here, we analyze the intron landscape of Micromonas, a unicellular green alga in the Mamiellophyceae lineage, demonstrating the coexistence of several classes of introns and the occurrence of recent massive intron invasion. This study focuses on two strains, CCMP1545 and RCC299, and their related individuals from ocean samplings, showing that they not only harbor different classes of introns depending on their location in the genome, as for other Mamiellophyceae, but also uniquely carry several classes of repeat introns. These introns, dubbed introner elements (IEs), are found at novel positions in genes and have conserved sequences, contrary to canonical introns. This IE invasion has a huge impact on the genome, doubling the number of introns in the CCMP1545 strain. We hypothesize that each IE class originated from a single ancestral IE that has been colonizing the genome after strain divergence by inserting copies of itself into genes by intron transposition, likely involving reverse splicing. Along with similar cases recently observed in other organisms, our observations in Micromonas strains shed a new light on the evolution of introns, suggesting that intron gain is more widespread than previously thought.
PMCID: PMC3879977  PMID: 24273312
intron evolution; intron gain; Mamiellophyceae; Micromonas; introner elements
8.  Integrative Genomic Analysis Implicates Gain of PIK3CA at 3q26 and MYC at 8q24 in Chronic Lymphocytic Leukemia 
The disease course of chronic lymphocytic leukemia (CLL) varies significantly within cytogenetic groups. We hypothesized that high resolution genomic analysis of CLL would identify additional recurrent abnormalities associated with short time to first therapy (TTFT).
Experimental Design
We undertook high resolution genomic analysis of 161 prospectively enrolled CLLs using Affymetrix 6.0 SNP arrays, and integrated analysis of this dataset with gene expression profiles.
Copy number analysis (CNA) of nonprogressive CLL reveals a stable genotype, with a median of only 1 somatic CNA per sample. Progressive CLL with 13q deletion was associated with additional somatic CNAs, and a greater number of CNAs was predictive of TTFT. We identified other recurrent CNAs associated with short TTFT: 8q24 amplification focused on the cancer susceptibility locus near MYC in 3.7%; 3q26 amplifications focused on PIK3CA in 5.6%; and 8p deletions in 5% of patients. Sequencing of MYC further identified somatic mutations in two CLLs. We determined which catalytic subunits of PI3K were in active complex with the p85 regulatory subunit, and demonstrated enrichment for the alpha subunit in three CLLs carrying PIK3CA amplification.
Our findings implicate amplifications of 3q26 focused on PIK3CA and 8q24 focused on MYC in CLL.
PMCID: PMC3719990  PMID: 22623730
CLL; PIK3CA; MYC; genomics; copy number
9.  Reannotation and extended community resources for the genome of the non-seed plant Physcomitrella patens provide insights into the evolution of plant gene structures and functions 
BMC Genomics  2013;14:498.
The moss Physcomitrella patens as a model species provides an important reference for early-diverging lineages of plants and the release of the genome in 2008 opened the doors to genome-wide studies. The usability of a reference genome greatly depends on the quality of the annotation and the availability of centralized community resources. Therefore, in the light of accumulating evidence for missing genes, fragmentary gene structures, false annotations and a low rate of functional annotations on the original release, we decided to improve the moss genome annotation.
Here, we report the complete moss genome re-annotation (designated V1.6) incorporating the increased transcript availability from a multitude of developmental stages and tissue types. We demonstrate the utility of the improved P. patens genome annotation for comparative genomics and new extensions to the resource as a central repository for this plant “flagship” genome. The structural annotation of 32,275 protein-coding genes results in 8387 additional loci including 1456 loci with known protein domains or homologs in Plantae. This is the first release to include information on transcript isoforms, suggesting alternative splicing events for at least 10.8% of the loci. Furthermore, this release now also provides information on non-protein-coding loci. Functional annotations were improved regarding quality and coverage, resulting in 58% annotated loci (previously: 41%) that comprise also 7200 additional loci with GO annotations. Access and manual curation of the functional and structural genome annotation is provided via the model organism database.
Comparative analysis of gene structure evolution along the green plant lineage provides novel insights, such as a comparatively high number of loci with 5’-UTR introns in the moss. Comparative analysis of functional annotations reveals expansions of moss house-keeping and metabolic genes and further possibly adaptive, lineage-specific expansions and gains including at least 13% orphan genes.
PMCID: PMC3729371  PMID: 23879659
Bryophyte; Physcomitrella patens; Genome annotation; Gene structure; Reference genome; Model organism; UTR; Plant evolution; Non-flowering plant; Orphan genes
10.  Large-Scale Event Extraction from Literature with Multi-Level Gene Normalization 
PLoS ONE  2013;8(4):e55814.
Text mining for the life sciences aims to aid database curation, knowledge summarization and information retrieval through the automated processing of biomedical texts. To provide comprehensive coverage and enable full integration with existing biomolecular database records, it is crucial that text mining tools scale up to millions of articles and that their analyses can be unambiguously linked to information recorded in resources such as UniProt, KEGG, BioGRID and NCBI databases. In this study, we investigate how fully automated text mining of complex biomolecular events can be augmented with a normalization strategy that identifies biological concepts in text, mapping them to identifiers at varying levels of granularity, ranging from canonicalized symbols to unique gene and proteins and broad gene families. To this end, we have combined two state-of-the-art text mining components, previously evaluated on two community-wide challenges, and have extended and improved upon these methods by exploiting their complementary nature. Using these systems, we perform normalization and event extraction to create a large-scale resource that is publicly available, unique in semantic scope, and covers all 21.9 million PubMed abstracts and 460 thousand PubMed Central open access full-text articles. This dataset contains 40 million biomolecular events involving 76 million gene/protein mentions, linked to 122 thousand distinct genes from 5032 species across the full taxonomic tree. Detailed evaluations and analyses reveal promising results for application of this data in database and pathway curation efforts. The main software components used in this study are released under an open-source license. Further, the resulting dataset is freely accessible through a novel API, providing programmatic and customized access ( Finally, to allow for large-scale bioinformatic analyses, the entire resource is available for bulk download from, under the Creative Commons – Attribution – Share Alike (CC BY-SA) license.
PMCID: PMC3629104  PMID: 23613707
11.  Gene functionalities and genome structure in Bathycoccus prasinos reflect cellular specializations at the base of the green lineage 
Genome Biology  2012;13(8):R74.
Bathycoccus prasinos is an extremely small cosmopolitan marine green alga whose cells are covered with intricate spider's web patterned scales that develop within the Golgi cisternae before their transport to the cell surface. The objective of this work is to sequence and analyze its genome, and to present a comparative analysis with other known genomes of the green lineage.
Its small genome of 15 Mb consists of 19 chromosomes and lacks transposons. Although 70% of all B. prasinos genes share similarities with other Viridiplantae genes, up to 428 genes were probably acquired by horizontal gene transfer, mainly from other eukaryotes. Two chromosomes, one big and one small, are atypical, an unusual synapomorphic feature within the Mamiellales. Genes on these atypical outlier chromosomes show lower GC content and a significant fraction of putative horizontal gene transfer genes. Whereas the small outlier chromosome lacks colinearity with other Mamiellales and contains many unknown genes without homologs in other species, the big outlier shows a higher intron content, increased expression levels and a unique clustering pattern of housekeeping functionalities. Four gene families are highly expanded in B. prasinos, including sialyltransferases, sialidases, ankyrin repeats and zinc ion-binding genes, and we hypothesize that these genes are associated with the process of scale biogenesis.
The minimal genomes of the Mamiellophyceae provide a baseline for evolutionary and functional analyses of metabolic processes in green plants.
PMCID: PMC3491373  PMID: 22925495
12.  Transcriptional Profiling of Plasmodium falciparum Parasites from Patients with Severe Malaria Identifies Distinct Low vs. High Parasitemic Clusters 
PLoS ONE  2012;7(7):e40739.
In the past decade, estimates of malaria infections have dropped from 500 million to 225 million per year; likewise, mortality rates have dropped from 3 million to 791,000 per year. However, approximately 90% of these deaths continue to occur in sub-Saharan Africa, and 85% involve children less than 5 years of age. Malaria mortality in children generally results from one or more of the following clinical syndromes: severe anemia, acidosis, and cerebral malaria. Although much is known about the clinical and pathological manifestations of CM, insights into the biology of the malaria parasite, specifically transcription during this manifestation of severe infection, are lacking.
Methods and Findings
We collected peripheral blood from children meeting the clinical case definition of cerebral malaria from a cohort in Malawi, examined the patients for the presence or absence of malaria retinopathy, and performed whole genome transcriptional profiling for Plasmodium falciparum using a custom designed Affymetrix array. We identified two distinct physiological states that showed highly significant association with the level of parasitemia. We compared both groups of Malawi expression profiles with our previously acquired ex vivo expression profiles of parasites derived from infected patients with mild disease; a large collection of in vitro Plasmodium falciparum life cycle gene expression profiles; and an extensively annotated compendium of expression data from Saccharomyces cerevisiae. The high parasitemia patient group demonstrated a unique biology with elevated expression of Hrd1, a member of endoplasmic reticulum-associated protein degradation system.
The presence of a unique high parasitemia state may be indicative of the parasite biology of the clinically recognized hyperparasitemic severe disease syndrome.
PMCID: PMC3399889  PMID: 22815802
13.  Semantically linking molecular entities in literature through entity relationships 
BMC Bioinformatics  2012;13(Suppl 11):S6.
Text mining tools have gained popularity to process the vast amount of available research articles in the biomedical literature. It is crucial that such tools extract information with a sufficient level of detail to be applicable in real life scenarios. Studies of mining non-causal molecular relations attribute to this goal by formally identifying the relations between genes, promoters, complexes and various other molecular entities found in text. More importantly, these studies help to enhance integration of text mining results with database facts.
We describe, compare and evaluate two frameworks developed for the prediction of non-causal or 'entity' relations (REL) between gene symbols and domain terms. For the corresponding REL challenge of the BioNLP Shared Task of 2011, these systems ranked first (57.7% F-score) and second (41.6% F-score). In this paper, we investigate the performance discrepancy of 16 percentage points by benchmarking on a related and more extensive dataset, analysing the contribution of both the term detection and relation extraction modules. We further construct a hybrid system combining the two frameworks and experiment with intersection and union combinations, achieving respectively high-precision and high-recall results. Finally, we highlight extremely high-performance results (F-score >90%) obtained for the specific subclass of embedded entity relations that are essential for integrating text mining predictions with database facts.
The results from this study will enable us in the near future to annotate semantic relations between molecular entities in the entire scientific literature available through PubMed. The recent release of the EVEX dataset, containing biomolecular event predictions for millions of PubMed articles, is an interesting and exciting opportunity to overlay these entity relations with event predictions on a literature-wide scale.
PMCID: PMC3384255  PMID: 22759460
14.  The Medicago Genome Provides Insight into the Evolution of Rhizobial Symbioses 
Young, Nevin D. | Debellé, Frédéric | Oldroyd, Giles E. D. | Geurts, Rene | Cannon, Steven B. | Udvardi, Michael K. | Benedito, Vagner A. | Mayer, Klaus F. X. | Gouzy, Jérôme | Schoof, Heiko | Van de Peer, Yves | Proost, Sebastian | Cook, Douglas R. | Meyers, Blake C. | Spannagl, Manuel | Cheung, Foo | De Mita, Stéphane | Krishnakumar, Vivek | Gundlach, Heidrun | Zhou, Shiguo | Mudge, Joann | Bharti, Arvind K. | Murray, Jeremy D. | Naoumkina, Marina A. | Rosen, Benjamin | Silverstein, Kevin A. T. | Tang, Haibao | Rombauts, Stephane | Zhao, Patrick X. | Zhou, Peng | Barbe, Valérie | Bardou, Philippe | Bechner, Michael | Bellec, Arnaud | Berger, Anne | Bergès, Hélène | Bidwell, Shelby | Bisseling, Ton | Choisne, Nathalie | Couloux, Arnaud | Denny, Roxanne | Deshpande, Shweta | Dai, Xinbin | Doyle, Jeff | Dudez, Anne-Marie | Farmer, Andrew D. | Fouteau, Stéphanie | Franken, Carolien | Gibelin, Chrystel | Gish, John | Goldstein, Steven | González, Alvaro J. | Green, Pamela J. | Hallab, Asis | Hartog, Marijke | Hua, Axin | Humphray, Sean | Jeong, Dong-Hoon | Jing, Yi | Jöcker, Anika | Kenton, Steve M. | Kim, Dong-Jin | Klee, Kathrin | Lai, Hongshing | Lang, Chunting | Lin, Shaoping | Macmil, Simone L | Magdelenat, Ghislaine | Matthews, Lucy | McCorrison, Jamison | Monaghan, Erin L. | Mun, Jeong-Hwan | Najar, Fares Z. | Nicholson, Christine | Noirot, Céline | O’Bleness, Majesta | Paule, Charles R. | Poulain, Julie | Prion, Florent | Qin, Baifang | Qu, Chunmei | Retzel, Ernest F. | Riddle, Claire | Sallet, Erika | Samain, Sylvie | Samson, Nicolas | Sanders, Iryna | Saurat, Olivier | Scarpelli, Claude | Schiex, Thomas | Segurens, Béatrice | Severin, Andrew J. | Sherrier, D. Janine | Shi, Ruihua | Sims, Sarah | Singer, Susan R. | Sinharoy, Senjuti | Sterck, Lieven | Viollet, Agnès | Wang, Bing-Bing | Wang, Keqin | Wang, Mingyi | Wang, Xiaohong | Warfsmann, Jens | Weissenbach, Jean | White, Doug D. | White, Jim D. | Wiley, Graham B. | Wincker, Patrick | Xing, Yanbo | Yang, Limei | Yao, Ziyun | Ying, Fu | Zhai, Jixian | Zhou, Liping | Zuber, Antoine | Dénarié, Jean | Dixon, Richard A. | May, Gregory D. | Schwartz, David C. | Rogers, Jane | Quétier, Francis | Town, Christopher D. | Roe, Bruce A.
Nature  2011;480(7378):520-524.
Legumes (Fabaceae or Leguminosae) are unique among cultivated plants for their ability to carry out endosymbiotic nitrogen fixation with rhizobial bacteria, a process that takes place in a specialized structure known as the nodule. Legumes belong to one of the two main groups of eurosids, the Fabidae, which includes most species capable of endosymbiotic nitrogen fixation 1. Legumes comprise several evolutionary lineages derived from a common ancestor 60 million years ago (Mya). Papilionoids are the largest clade, dating nearly to the origin of legumes and containing most cultivated species 2. Medicago truncatula (Mt) is a long-established model for the study of legume biology. Here we describe the draft sequence of the Mt euchromatin based on a recently completed BAC-assembly supplemented with Illumina-shotgun sequence, together capturing ~94% of all Mt genes. A whole-genome duplication (WGD) approximately 58 Mya played a major role in shaping the Mt genome and thereby contributed to the evolution of endosymbiotic nitrogen fixation. Subsequent to the WGD, the Mt genome experienced higher levels of rearrangement than two other sequenced legumes, Glycine max (Gm) and Lotus japonicus (Lj). Mt is a close relative of alfalfa (M. sativa), a widely cultivated crop with limited genomics tools and complex autotetraploid genetics. As such, the Mt genome sequence provides significant opportunities to expand alfalfa’s genomic toolbox.
PMCID: PMC3272368  PMID: 22089132
15.  Exploring Biomolecular Literature with EVEX: Connecting Genes through Events, Homology, and Indirect Associations 
Advances in Bioinformatics  2012;2012:582765.
Technological advancements in the field of genetics have led not only to an abundance of experimental data, but also caused an exponential increase of the number of published biomolecular studies. Text mining is widely accepted as a promising technique to help researchers in the life sciences deal with the amount of available literature. This paper presents a freely available web application built on top of 21.3 million detailed biomolecular events extracted from all PubMed abstracts. These text mining results were generated by a state-of-the-art event extraction system and enriched with gene family associations and abstract generalizations, accounting for lexical variants and synonymy. The EVEX resource locates relevant literature on phosphorylation, regulation targets, binding partners, and several other biomolecular events and assigns confidence values to these events. The search function accepts official gene/protein symbols as well as common names from all species. Finally, the web application is a powerful tool for generating homology-based hypotheses as well as novel, indirect associations between genes and proteins such as coregulators.
PMCID: PMC3375141  PMID: 22719757
16.  A mystery unveiled 
Genome Biology  2011;12(5):113.
A recent phylogenomic study has provided new evidence for two ancient whole genome duplications in plants, with potential importance for the evolution of seed and flowering plants.
PMCID: PMC3219959  PMID: 21635712
17.  Evaluation and Properties of the Budding Yeast Phosphoproteome* 
Molecular & Cellular Proteomics : MCP  2012;11(6):M111.009555.
We have assembled a reliable phosphoproteomic data set for budding yeast Saccharomyces cerevisiae and have investigated its properties. Twelve publicly available phosphoproteome data sets were triaged to obtain a subset of high-confidence phosphorylation sites (p-sites), free of “noisy” phosphorylations. Analysis of this combined data set suggests that the inventory of phosphoproteins in yeast is close to completion, but that these proteins may have many undiscovered p-sites. Proteins involved in budding and protein kinase activity have high numbers of p-sites and are highly over-represented in the vast majority of the yeast phosphoproteome data sets. The yeast phosphoproteome is characterized by a few proteins with many p-sites and many proteins with a few p-sites. We confirm a tendency for p-sites to cluster together and find evidence that kinases may phosphorylate off-target amino acids that are within one or two residues of their cognate target. This suggests that the precise position of the phosphorylated amino acid is not a stringent requirement for regulatory fidelity. Compared with nonphosphorylated proteins, phosphoproteins are more ancient, more abundant, have longer unstructured regions, have more genetic interactions, more protein interactions, and are under tighter post-translational regulation. It appears that phosphoproteins constitute the raw material for pathway rewiring and adaptation at various evolutionary rates.
PMCID: PMC3433898  PMID: 22286756
18.  Deconstruction of the (Paleo)Polyploid Grapevine Genome Based on the Analysis of Transposition Events Involving NBS Resistance Genes 
PLoS ONE  2012;7(1):e29762.
Plants have followed a reticulate type of evolution and taxa have frequently merged via allopolyploidization. A polyploid structure of sequenced genomes has often been proposed, but the chromosomes belonging to putative component genomes are difficult to identify. The 19 grapevine chromosomes are evolutionary stable structures: their homologous triplets have strongly conserved gene order, interrupted by rare translocations. The aim of this study is to examine how the grapevine nucleotide-binding site (NBS)-encoding resistance (NBS-R) genes have evolved in the genomic context and to understand mechanisms for the genome evolution. We show that, in grapevine, i) helitrons have significantly contributed to transposition of NBS-R genes, and ii) NBS-R gene cluster similarity indicates the existence of two groups of chromosomes (named as Va and Vc) that may have evolved independently. Chromosome triplets consist of two Va and one Vc chromosomes, as expected from the tetraploid and diploid conditions of the two component genomes. The hexaploid state could have been derived from either allopolyploidy or the separation of the Va and Vc component genomes in the same nucleus before fusion, as known for Rosaceae species. Time estimation indicates that grapevine component genomes may have fused about 60 mya, having had at least 40–60 mya to evolve independently. Chromosome number variation in the Vitaceae and related families, and the gap between the time of eudicot radiation and the age of Vitaceae fossils, are accounted for by our hypothesis.
PMCID: PMC3256180  PMID: 22253773
20.  i-ADHoRe 3.0—fast and sensitive detection of genomic homology in extremely large data sets 
Nucleic Acids Research  2011;40(2):e11.
Comparative genomics is a powerful means to gain insight into the evolutionary processes that shape the genomes of related species. As the number of sequenced genomes increases, the development of software to perform accurate cross-species analyses becomes indispensable. However, many implementations that have the ability to compare multiple genomes exhibit unfavorable computational and memory requirements, limiting the number of genomes that can be analyzed in one run. Here, we present a software package to unveil genomic homology based on the identification of conservation of gene content and gene order (collinearity), i-ADHoRe 3.0, and its application to eukaryotic genomes. The use of efficient algorithms and support for parallel computing enable the analysis of large-scale data sets. Unlike other tools, i-ADHoRe can process the Ensembl data set, containing 49 species, in 1 h. Furthermore, the profile search is more sensitive to detect degenerate genomic homology than chaining pairwise collinearity information based on transitive homology. From ultra-conserved collinear regions between mammals and birds, by integrating coexpression information and protein–protein interactions, we identified more than 400 regions in the human genome showing significant functional coherence. The different algorithmical improvements ensure that i-ADHoRe 3.0 will remain a powerful tool to study genome evolution.
PMCID: PMC3258164  PMID: 22102584
21.  GenomeView: a next-generation genome browser 
Nucleic Acids Research  2011;40(2):e12.
Due to ongoing advances in sequencing technologies, billions of nucleotide sequences are now produced on a daily basis. A major challenge is to visualize these data for further downstream analysis. To this end, we present GenomeView, a stand-alone genome browser specifically designed to visualize and manipulate a multitude of genomics data. GenomeView enables users to dynamically browse high volumes of aligned short-read data, with dynamic navigation and semantic zooming, from the whole genome level to the single nucleotide. At the same time, the tool enables visualization of whole genome alignments of dozens of genomes relative to a reference sequence. GenomeView is unique in its capability to interactively handle huge data sets consisting of tens of aligned genomes, thousands of annotation features and millions of mapped short reads both as viewer and editor. GenomeView is freely available as an open source software package.
PMCID: PMC3258165  PMID: 22102585
22.  The Arabidopsis lyrata genome sequence and the basis of rapid genome size change 
Nature genetics  2011;43(5):476-481.
We present the 207 Mb genome sequence of the outcrosser Arabidopsis lyrata, which diverged from the self-fertilizing species A. thaliana about 10 million years ago. It is generally assumed that the much smaller A. thaliana genome, which is only 125 Mb, constitutes the derived state for the family. Apparent genome reduction in this genus can be partially attributed to the loss of DNA from large-scale rearrangements, but the main cause lies in the hundreds of thousands of small deletions found throughout the genome. These occurred primarily in non-coding DNA and transposons, but protein-coding multi-gene families are smaller in A. thaliana as well. Analysis of deletions and insertions still segregating in A. thaliana indicates that the process of DNA loss is ongoing, suggesting pervasive selection for a smaller genome.
PMCID: PMC3083492  PMID: 21478890
23.  Comparative genomics of the pathogenic ciliate Ichthyophthirius multifiliis, its free-living relatives and a host species provide insights into adoption of a parasitic lifestyle and prospects for disease control 
Genome Biology  2011;12(10):R100.
Ichthyophthirius multifiliis, commonly known as Ich, is a highly pathogenic ciliate responsible for 'white spot', a disease causing significant economic losses to the global aquaculture industry. Options for disease control are extremely limited, and Ich's obligate parasitic lifestyle makes experimental studies challenging. Unlike most well-studied protozoan parasites, Ich belongs to a phylum composed primarily of free-living members. Indeed, it is closely related to the model organism Tetrahymena thermophila. Genomic studies represent a promising strategy to reduce the impact of this disease and to understand the evolutionary transition to parasitism.
We report the sequencing, assembly and annotation of the Ich macronuclear genome. Compared with its free-living relative T. thermophila, the Ich genome is reduced approximately two-fold in length and gene density and three-fold in gene content. We analyzed in detail several gene classes with diverse functions in behavior, cellular function and host immunogenicity, including protein kinases, membrane transporters, proteases, surface antigens and cytoskeletal components and regulators. We also mapped by orthology Ich's metabolic pathways in comparison with other ciliates and a potential host organism, the zebrafish Danio rerio.
Knowledge of the complete protein-coding and metabolic potential of Ich opens avenues for rational testing of therapeutic drugs that target functions essential to this parasite but not to its fish hosts. Also, a catalog of surface protein-encoding genes will facilitate development of more effective vaccines. The potential to use T. thermophila as a surrogate model offers promise toward controlling 'white spot' disease and understanding the adaptation to a parasitic lifestyle.
PMCID: PMC3341644  PMID: 22004680
25.  Structural and functional organization of RNA regulons in the post-transcriptional regulatory network of yeast 
Nucleic Acids Research  2011;39(21):9108-9117.
Post-transcriptional control of mRNA transcript processing by RNA binding proteins (RBPs) is an important step in the regulation of gene expression and protein production. The post-transcriptional regulatory network is similar in complexity to the transcriptional regulatory network and is thought to be organized in RNA regulons, coherent sets of functionally related mRNAs combinatorially regulated by common RBPs. We integrated genome-wide transcriptional and translational expression data in yeast with large-scale regulatory networks of transcription factor and RBP binding interactions to analyze the functional organization of post-transcriptional regulation and RNA regulons at a system level. We found that post-transcriptional feedback loops and mixed bifan motifs are overrepresented in the integrated regulatory network and control the coordinated translation of RNA regulons, manifested as clusters of functionally related mRNAs which are strongly coexpressed in the translatome data. These translatome clusters are more functionally coherent than transcriptome clusters and are expressed with higher mRNA and protein levels and less noise. Our results show how the post-transcriptional network is intertwined with the transcriptional network to regulate gene expression in a coordinated way and that the integration of heterogeneous genome-wide datasets allows to relate structure to function in regulatory networks at a system level.
PMCID: PMC3241661  PMID: 21840901

Results 1-25 (69)