Search tips
Search criteria

Results 1-9 (9)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  PhylDiag: identifying complex synteny blocks that include tandem duplications using phylogenetic gene trees 
BMC Bioinformatics  2014;15(1):268.
Extant genomes share regions where genes have the same order and orientation, which are thought to arise from the conservation of an ancestral order of genes during evolution. Such regions of so-called conserved synteny, or synteny blocks, must be precisely identified and quantified, as a prerequisite to better understand the evolutionary history of genomes.
Here we describe PhylDiag, a software that identifies statistically significant synteny blocks in pairwise comparisons of eukaryote genomes. Compared to previous methods, PhylDiag uses gene trees to define gene homologies, thus allowing gene deletions to be considered as events that may break the synteny. PhylDiag also accounts for gene orientations, blocks of tandem duplicates and lineage specific de novo gene births. Starting from two genomes and the corresponding gene trees, PhylDiag returns synteny blocks with gaps less than or equal to the maximum gap parameter gapmax. This parameter is theoretically estimated, and together with a utility to graphically display results, contributes to making PhylDiag a user friendly method. In addition, putative synteny blocks are subject to a statistical validation to verify that they are unlikely to be due to a random combination of genes.
We benchmark several known metrics to measure 2D-distances in a matrix of homologies and we compare PhylDiag to i-ADHoRe 3.0 on real and simulated data. We show that PhylDiag correctly identifies small synteny blocks even with insertions, deletions, incorrect annotations or micro-inversions. Finally, PhylDiag allowed us to identify the most relevant distance metric for 2D-distance calculation between homologies.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-268) contains supplementary material, which is available to authorized users.
PMCID: PMC4155083  PMID: 25103980
Comparative genomics; Synteny; Synteny block; Segmental homologies; Homology; Gene order; Rearrangement; Ancestral genome; Gene tree
2.  The zebrafish reference genome sequence and its relationship to the human genome 
Howe, Kerstin | Clark, Matthew D. | Torroja, Carlos F. | Torrance, James | Berthelot, Camille | Muffato, Matthieu | Collins, John E. | Humphray, Sean | McLaren, Karen | Matthews, Lucy | McLaren, Stuart | Sealy, Ian | Caccamo, Mario | Churcher, Carol | Scott, Carol | Barrett, Jeffrey C. | Koch, Romke | Rauch, Gerd-Jörg | White, Simon | Chow, William | Kilian, Britt | Quintais, Leonor T. | Guerra-Assunção, José A. | Zhou, Yi | Gu, Yong | Yen, Jennifer | Vogel, Jan-Hinnerk | Eyre, Tina | Redmond, Seth | Banerjee, Ruby | Chi, Jianxiang | Fu, Beiyuan | Langley, Elizabeth | Maguire, Sean F. | Laird, Gavin K. | Lloyd, David | Kenyon, Emma | Donaldson, Sarah | Sehra, Harminder | Almeida-King, Jeff | Loveland, Jane | Trevanion, Stephen | Jones, Matt | Quail, Mike | Willey, Dave | Hunt, Adrienne | Burton, John | Sims, Sarah | McLay, Kirsten | Plumb, Bob | Davis, Joy | Clee, Chris | Oliver, Karen | Clark, Richard | Riddle, Clare | Eliott, David | Threadgold, Glen | Harden, Glenn | Ware, Darren | Mortimer, Beverly | Kerry, Giselle | Heath, Paul | Phillimore, Benjamin | Tracey, Alan | Corby, Nicole | Dunn, Matthew | Johnson, Christopher | Wood, Jonathan | Clark, Susan | Pelan, Sarah | Griffiths, Guy | Smith, Michelle | Glithero, Rebecca | Howden, Philip | Barker, Nicholas | Stevens, Christopher | Harley, Joanna | Holt, Karen | Panagiotidis, Georgios | Lovell, Jamieson | Beasley, Helen | Henderson, Carl | Gordon, Daria | Auger, Katherine | Wright, Deborah | Collins, Joanna | Raisen, Claire | Dyer, Lauren | Leung, Kenric | Robertson, Lauren | Ambridge, Kirsty | Leongamornlert, Daniel | McGuire, Sarah | Gilderthorp, Ruth | Griffiths, Coline | Manthravadi, Deepa | Nichol, Sarah | Barker, Gary | Whitehead, Siobhan | Kay, Michael | Brown, Jacqueline | Murnane, Clare | Gray, Emma | Humphries, Matthew | Sycamore, Neil | Barker, Darren | Saunders, David | Wallis, Justene | Babbage, Anne | Hammond, Sian | Mashreghi-Mohammadi, Maryam | Barr, Lucy | Martin, Sancha | Wray, Paul | Ellington, Andrew | Matthews, Nicholas | Ellwood, Matthew | Woodmansey, Rebecca | Clark, Graham | Cooper, James | Tromans, Anthony | Grafham, Darren | Skuce, Carl | Pandian, Richard | Andrews, Robert | Harrison, Elliot | Kimberley, Andrew | Garnett, Jane | Fosker, Nigel | Hall, Rebekah | Garner, Patrick | Kelly, Daniel | Bird, Christine | Palmer, Sophie | Gehring, Ines | Berger, Andrea | Dooley, Christopher M. | Ersan-Ürün, Zübeyde | Eser, Cigdem | Geiger, Horst | Geisler, Maria | Karotki, Lena | Kirn, Anette | Konantz, Judith | Konantz, Martina | Oberländer, Martina | Rudolph-Geiger, Silke | Teucke, Mathias | Osoegawa, Kazutoyo | Zhu, Baoli | Rapp, Amanda | Widaa, Sara | Langford, Cordelia | Yang, Fengtang | Carter, Nigel P. | Harrow, Jennifer | Ning, Zemin | Herrero, Javier | Searle, Steve M. J. | Enright, Anton | Geisler, Robert | Plasterk, Ronald H. A. | Lee, Charles | Westerfield, Monte | de Jong, Pieter J. | Zon, Leonard I. | Postlethwait, John H. | Nüsslein-Volhard, Christiane | Hubbard, Tim J. P. | Crollius, Hugues Roest | Rogers, Jane | Stemple, Derek L.
Nature  2013;496(7446):498-503.
Zebrafish have become a popular organism for the study of vertebrate gene function1,2. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease3–5. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes6, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
PMCID: PMC3703927  PMID: 23594743
3.  Plasticity of Animal Genome Architecture Unmasked by Rapid Evolution of a Pelagic Tunicate 
Science (New York, N.Y.)  2010;330(6009):1381-1385.
Genomes of animals as different as sponges and humans show conservation of global architecture. Here we show that multiple genomic features including transposon diversity, developmental gene repertoire, physical gene order, and intron-exon organization are shattered in the tunicate Oikopleura, belonging to the sister group of vertebrates and retaining chordate morphology. Ancestral architecture of animal genomes can be deeply modified and may therefore be largely nonadaptive. This rapidly evolving animal lineage thus offers unique perspectives on the level of genome plasticity. It also illuminates issues as fundamental as the mechanisms of intron gain.
PMCID: PMC3760481  PMID: 21097902
4.  Preventing Dangerous Nonsense: Selection for Robustness to Transcriptional Error in Human Genes 
PLoS Genetics  2011;7(10):e1002276.
Nonsense Mediated Decay (NMD) degrades transcripts that contain a premature STOP codon resulting from mistranscription or missplicing. However NMD's surveillance of gene expression varies in efficiency both among and within human genes. Previous work has shown that the intron content of human genes is influenced by missplicing events invisible to NMD. Given the high rate of transcriptional errors in eukaryotes, we hypothesized that natural selection has promoted a dual strategy of “prevention and cure” to alleviate the problem of nonsense transcriptional errors. A prediction of this hypothesis is that NMD's inefficiency should leave a signature of “transcriptional robustness” in human gene sequences that reduces the frequency of nonsense transcriptional errors. For human genes we determined the usage of “fragile” codons, prone to mistranscription into STOP codons, relative to the usage of “robust” codons that do not generate nonsense errors. We observe that single-exon genes have evolved to become robust to mistranscription, because they show a significant tendency to avoid fragile codons relative to robust codons when compared to multi-exon genes. A similar depletion is evident in last exons of multi-exon genes. Histone genes are particularly depleted of fragile codons and thus highly robust to transcriptional errors. Finally, the protein products of single-exon genes show a strong tendency to avoid those amino acids that can only be encoded using fragile codons. Each of these observations can be attributed to NMD deficiency. Thus, in the human genome, wherever the “cure” for nonsense (i.e. NMD) is inefficient, there is increased reliance on the strategy of nonsense “prevention” (i.e. transcriptional robustness). This study shows that human genes are exposed to the deleterious influence of transcriptional errors. Moreover, it suggests that gene expression errors are an underestimated phenomenon, in molecular evolution in general and in selection for genomic robustness in particular.
Author Summary
In biological systems mistakes are made constantly because the cellular machinery is complex and error-prone. Mistakes are made in copying DNA for transmission to offspring (“genetic mutations”) but are much more frequent in the day-to-day task of gene expression. Genetic mutations are heritable and therefore have long been the almost exclusive focus of evolutionary genetics research. In contrast, gene expression errors are not inherited and have tended to be disregarded in evolutionary studies. Here we show how human genes have evolved a mechanism to reduce the occurrence of a specific type of gene expression error—transcriptional errors that create premature STOP codons (so-called “nonsense errors”). Nonsense errors are potentially highly toxic for the cell, so natural selection has evolved a strategy called Nonsense Mediated Decay (NMD) to “cure” such errors. However this cure is inefficient. Here we describe how a preventative strategy of “transcriptional robustness” has evolved to decrease the frequency of nonsense errors. Moreover, these “prevention and cure” strategies are used interchangeably—the most transcriptionally robust genes are those for which NMD is most inefficient. Our work implies that gene expression errors play an important role as supporting actors to genetic mutations in molecular evolution, particularly in the evolution of robustness.
PMCID: PMC3192821  PMID: 22022272
5.  Genomicus: a database and a browser to study gene synteny in modern and ancestral genomes 
Bioinformatics  2010;26(8):1119-1121.
Summary: Comparative genomics remains a pivotal strategy to study the evolution of gene organization, and this primacy is reinforced by the growing number of full genome sequences available in public repositories. Despite this growth, bioinformatic tools available to visualize and compare genomes and to infer evolutionary events remain restricted to two or three genomes at a time, thus limiting the breadth and the nature of the question that can be investigated. Here we present Genomicus, a new synteny browser that can represent and compare unlimited numbers of genomes in a broad phylogenetic view. In addition, Genomicus includes reconstructed ancestral gene organization, thus greatly facilitating the interpretation of the data.
Availability: Genomicus is freely available for online use at while data can be downloaded at
PMCID: PMC2853686  PMID: 20185404
6.  Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA 
Genome Biology  2006;7(Suppl 1):S7.
Accurate and automatic gene identification in eukaryotic genomic DNA is more than ever of crucial importance to efficiently exploit the large volume of assembled genome sequences available to the community. Automatic methods have always been considered less reliable than human expertise. This is illustrated in the EGASP project, where reference annotations against which all automatic methods are measured are generated by human annotators and experimentally verified. We hypothesized that replicating the accuracy of human annotators in an automatic method could be achieved by formalizing the rules and decisions that they use, in a mathematical formalism.
We have developed Exogean, a flexible framework based on directed acyclic colored multigraphs (DACMs) that can represent biological objects (for example, mRNA, ESTs, protein alignments, exons) and relationships between them. Graphs are analyzed to process the information according to rules that replicate those used by human annotators. Simple individual starting objects given as input to Exogean are thus combined and synthesized into complex objects such as protein coding transcripts.
We show here, in the context of the EGASP project, that Exogean is currently the method that best reproduces protein coding gene annotations from human experts, in terms of identifying at least one exact coding sequence per gene. We discuss current limitations of the method and several avenues for improvement.
PMCID: PMC1810556  PMID: 16925841
7.  Detecting single DNA copy number variations in complex genomes using one nanogram of starting DNA and BAC-array CGH 
Nucleic Acids Research  2004;32(13):e112.
Comparative genomic hybridization to bacterial artificial chromosome (BAC)-arrays (array-CGH) is a highly efficient technique, allowing the simultaneous measurement of genomic DNA copy number at hundreds or thousands of loci, and the reliable detection of local one-copy-level variations. We report a genome-wide amplification method allowing the same measurement sensitivity, using 1 ng of starting genomic DNA, instead of the classical 1 μg usually necessary. Using a discrete series of DNA fragments, we defined the parameters adapted to the most faithful ligation-mediated PCR amplification and the limits of the technique. The optimized protocol allows a 3000-fold DNA amplification, retaining the quantitative characteristics of the initial genome. Validation of the amplification procedure, using DNA from 10 tumour cell lines hybridized to BAC-arrays of 1500 spots, showed almost perfectly superimposed ratios for the non-amplified and amplified DNAs. Correlation coefficients of 0.96 and 0.99 were observed for regions of low-copy-level variations and all regions, respectively (including in vivo amplified oncogenes). Finally, labelling DNA using two nucleotides bearing the same fluorophore led to a significant increase in reproducibility and to the correct detection of one-copy gain or loss in >90% of the analysed data, even for pseudotriploid tumour genomes.
PMCID: PMC506828  PMID: 15284333
8.  Comparative genomic analysis reveals independent expansion of a lineage-specific gene family in vertebrates: The class II cytokine receptors and their ligands in mammals and fish 
BMC Genomics  2003;4:29.
The high degree of sequence conservation between coding regions in fish and mammals can be exploited to identify genes in mammalian genomes by comparison with the sequence of similar genes in fish. Conversely, experimentally characterized mammalian genes may be used to annotate fish genomes. However, gene families that escape this principle include the rapidly diverging cytokines that regulate the immune system, and their receptors. A classic example is the class II helical cytokines (HCII) including type I, type II and lambda interferons, IL10 related cytokines (IL10, IL19, IL20, IL22, IL24 and IL26) and their receptors (HCRII). Despite the report of a near complete pufferfish (Takifugu rubripes) genome sequence, these genes remain undescribed in fish.
We have used an original strategy based both on conserved amino acid sequence and gene structure to identify HCII and HCRII in the genome of another pufferfish, Tetraodon nigroviridis that is amenable to laboratory experiments. The 15 genes that were identified are highly divergent and include a single interferon molecule, three IL10 related cytokines and their potential receptors together with two Tissue Factor (TF). Some of these genes form tandem clusters on the Tetraodon genome. Their expression pattern was determined in different tissues. Most importantly, Tetraodon interferon was identified and we show that the recombinant protein can induce antiviral MX gene expression in Tetraodon primary kidney cells. Similar results were obtained in Zebrafish which has 7 MX genes.
We propose a scheme for the evolution of HCII and their receptors during the radiation of bony vertebrates and suggest that the diversification that played an important role in the fine-tuning of the ancestral mechanism for host defense against infections probably followed different pathways in amniotes and fish.
PMCID: PMC179897  PMID: 12869211
9.  The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates 
Nature Communications  2014;5:3657.
Vertebrate evolution has been shaped by several rounds of whole-genome duplications (WGDs) that are often suggested to be associated with adaptive radiations and evolutionary innovations. Due to an additional round of WGD, the rainbow trout genome offers a unique opportunity to investigate the early evolutionary fate of a duplicated vertebrate genome. Here we show that after 100 million years of evolution the two ancestral subgenomes have remained extremely collinear, despite the loss of half of the duplicated protein-coding genes, mostly through pseudogenization. In striking contrast is the fate of miRNA genes that have almost all been retained as duplicated copies. The slow and stepwise rediploidization process characterized here challenges the current hypothesis that WGD is followed by massive and rapid genomic reorganizations and gene deletions.
Although whole-genome duplications (WGDs) are rare events, they have an important role in shaping vertebrate evolution. Here, the authors sequence the rainbow trout genome and show that rediploidization after WGD occurs in a slow and stepwise manner.
PMCID: PMC4071752  PMID: 24755649

Results 1-9 (9)