It is important to understand the forces that shape the size and evolutionary histories of gene families. Here, we investigated the evolution of non–protein-coding RNA genes in the genomes of Caenorhabditis nematodes. We specifically focused on nested arrangements, that is, cases in which an RNA gene is entirely contained in an intron of another gene. Comparing these arrangements between species simplifies the inference of orthology and, therefore, of evolutionary fates of nested genes. Two distinct patterns are evident in the data. Genes encoding small nuclear RNAs (snRNAs) and transfer RNAs form large families, which have persisted since before the common ancestor of Metazoa. Yet, individual genes die relatively rapidly, with few orthologs having survived since the divergence of Caenorhabditis elegans and Caenorhabditis briggsae. In contrast, genes encoding small nucleolar RNAs (snoRNAs) are either single-copy or form small families. Individual snoRNAs turn over at a relatively slow rate—most C. elegans genes have clearly identifiable orthologs in C. briggsae. We also found that in Drosophila, genes from larger snRNA families die at a faster rate than their counterparts from single-gene families. These results suggest that a relationship between family size and the rate of gene turnover may be a general feature of genome evolution.
birth-and-death; gene family; evolution; small RNA; nested genes; C. elegans
The genomic architecture of the budding yeast Saccharomyces cerevisiae is typical of other eukaryotes in that genes are spatially organized into discrete and nonoverlapping units. Inherent in this organizational model is the assumption that protein-coding sequences do not overlap completely. Here, we present evidence to the contrary, defining a previously overlooked yeast gene, NAG1 (for nested antisense gene) nested entirely within the coding sequence of the YGR031W open reading frame in an antisense orientation on the opposite strand. NAG1 encodes a 19-kDa protein, detected by Western blotting of hemagglutinin (HA)-tagged Nag1p with anti-HA antibodies and by β-galactosidase analysis of a NAG1-lacZ fusion. NAG1 is evolutionarily conserved as a unit with YGR031W in bacteria and fungi. Unlike the YGR031WP protein product, however, which localizes to the mitochondria, Nag1p localizes to the cell periphery, exhibiting properties consistent with those of a plasma membrane protein. Phenotypic analysis of a site-directed mutant (nag1-1) disruptive for NAG1 but silent with respect to YGR031W, defines a role for NAG1 in yeast cell wall biogenesis; microarray profiling of nag1-1 indicates decreased expression of genes contributing to cell wall organization, and the nag1-1 mutant is hypersensitive to the cell wall-perturbing agent calcofluor white. Furthermore, production of Nag1p is dependent upon the presence of the cell wall integrity pathway mitogen-activated protein kinase Slt2p and its downstream transcription factor Rlm1p. Thus, NAG1 is important for two reasons. First, it contributes to yeast cell wall biogenesis. Second, its genomic context is novel, raising the possibility that other nested protein-coding genes may exist in eukaryotic genomes.
Transcriptional interference has been recently recognized as an unexpectedly complex and mostly negative regulation of genes. Despite a relatively few studies that emerged in recent years, it has been demonstrated that a readthrough transcription derived from one gene can influence the transcription of another overlapping or nested gene. However, the molecular effects resulting from this interaction are largely unknown.
Using in silico chromosome walking, we searched for prematurely terminated transcripts bearing signatures of intron retention or exonization of intronic sequence at their 3′ ends upstream to human L1 retrotransposons, protein-coding and noncoding nested genes. We demonstrate that transcriptional interference induced by intronic L1s (or other repeated DNAs) and nested genes could be characterized by intron retention, forced exonization and cryptic polyadenylation. These molecular effects were revealed from the analysis of endogenous transcripts derived from different cell lines and tissues and confirmed by the expression of three minigenes in cell culture. While intron retention and exonization were comparably observed in introns upstream to L1s, forced exonization was preferentially detected in nested genes. Transcriptional interference induced by L1 or nested genes was dependent on the presence or absence of cryptic splice sites, affected the inclusion or exclusion of the upstream exon and the use of cryptic polyadenylation signals.
Our results suggest that transcriptional interference induced by intronic L1s and nested genes could influence the transcription of the large number of genes in normal as well as in tumor tissues. Therefore, this type of interference could have a major impact on the regulation of the host gene expression.
Although the overlap of transcriptional units occurs frequently in eukaryotic genomes, its evolutionary and biological significance remains largely unclear. Here we report a comparative analysis of overlaps between genes coding for well-annotated proteins in five metazoan genomes (human, mouse, zebrafish, fruit fly and worm).
For all analyzed species the observed number of overlapping genes is always lower than expected assuming functional neutrality, suggesting that gene overlap is negatively selected. The comparison to the random distribution also shows that retained overlaps do not exhibit random features: antiparallel overlaps are significantly enriched, while overlaps lying on the same strand and those involving coding sequences are highly underrepresented. We confirm that overlap is mostly species-specific and provide evidence that it frequently originates through the acquisition of terminal, non-coding exons. Finally, we show that overlapping genes tend to be significantly co-expressed in a breast cancer cDNA library obtained by 454 deep sequencing, and that different overlap types display different patterns of reciprocal expression.
Our data suggest that overlap between protein-coding genes is selected against in Metazoa. However, when retained it may be used as a species-specific mechanism for the reciprocal regulation of neighboring genes. The tendency of overlaps to involve non-coding regions of the genes leads to the speculation that the advantages achieved by an overlapping arrangement may be optimized by evolving regulatory non-coding transcripts.
The repetitive DNA that constitutes most of the heterochromatic regions of metazoan genomes has hindered the comprehensive analysis of gene content and other functions. We have generated a detailed computational and manual annotation of 24 megabases of heterochromatic sequence in the Release 5 Drosophila melanogaster genome sequence. The heterochromatin contains a minimum of 230 to 254 protein-coding genes, which are conserved in other Drosophilids and more diverged species, as well as 32 pseudogenes and 13 noncoding RNAs. Improved methods revealed that more than 77% of this heterochromatin sequence, including introns and intergenic regions, is composed of fragmented and nested transposable elements and other repeated DNAs. Drosophila heterochromatin contains “islands” of highly conserved genes embedded in these “oceans” of complex repeats, which may require special expression and splicing mechanisms.
The presence of introns in protein-coding genes is a universal feature of eukaryotic genome organization, and the genes of multicellular eukaryotes, typically, contain multiple introns, a substantial fraction of which share position in distant taxa, such as plants and animals. Depending on the methods and data sets used, researchers have reached opposite conclusions on the causes of the high fraction of shared introns in orthologous genes from distant eukaryotes. Some studies conclude that shared intron positions reflect, almost entirely, a remarkable evolutionary conservation, whereas others attribute it to parallel gain of introns. To resolve these contradictions, it is crucial to analyze the evolution of introns by using a model that minimally relies on arbitrary assumptions.
We developed a probabilistic model of evolution that allows for variability of intron gain and loss rates over branches of the phylogenetic tree, individual genes, and individual sites. Applying this model to an extended set of conserved eukaryotic genes, we find that parallel gain, on average, accounts for only ~8% of the shared intron positions. However, the distribution of parallel gains over the phylogenetic tree of eukaryotes is highly non-uniform. There are, practically, no parallel gains in closely related lineages, whereas for distant lineages, such as animals and plants, parallel gains appear to contribute up to 20% of the shared intron positions. In accord with these findings, we estimated that ancestral introns have a high probability to be retained in extant genomes, and conversely, that a substantial fraction of extant introns have retained their positions since the early stages of eukaryotic evolution. In addition, the density of sites that are available for intron insertion is estimated to be, approximately, one in seven basepairs.
We obtained robust estimates of the contribution of parallel gain to the observed sharing of intron positions between eukaryotic species separated by different evolutionary distances. The results indicate that, although the contribution of parallel gains varies across the phylogenetic tree, the high level of intron position sharing is due, primarily, to evolutionary conservation. Accordingly, numerous introns appear to persist in the same position over hundreds of millions of years of evolution. This is compatible with recent observations of a negative correlation between the rate of intron gain and coding sequence evolution rate of a gene, suggesting that at least some of the introns are functionally relevant.
We have developed a web application for the detailed analysis and visualization of evolutionary sequence conservation in complex vertebrate genes. Given a pair of orthologous genes, the protein-coding sequences are aligned. When these sequences are mapped back onto their encoding exons in the genomes, a scaffold of the conserved gene structure naturally emerges. Sequence similarity between exons and introns is analysed and embedded into the gene structure scaffold. The visualization on the SVC server provides detailed information about evolutionarily conserved features of these genes. It further allows concise representation of complex splice patterns in the context of evolutionary conservation. A particular application of our tool arises from the fact that around mRNA editing sites both exonic and intronic sequences are highly conserved. This aids in delineation of these sites. SVC is available at .
The MITOchondrial genome database of metaZOAns (MitoZoa) is a public resource for comparative analyses of metazoan mitochondrial genomes (mtDNA) at both the sequence and genomic organizational levels. The main characteristics of the MitoZoa database are the careful revision of mtDNA entry annotations and the possibility of retrieving gene order and non-coding region (NCR) data in appropriate formats. The MitoZoa retrieval system enables basic and complex queries at various taxonomic levels using different search menus. MitoZoa 2.0 has been enhanced in several aspects, including: a re-annotation pipeline to check the correctness of protein-coding gene predictions; a standardized annotation of introns and of precursor ORFs whose functionality is post-transcriptionally recovered by RNA editing or programmed translational frameshifting; updates of taxon-related fields and a BLAST sequence similarity search tool. Database novelties and the definition of standard mtDNA annotation rules, together with the user-friendly retrieval system and the BLAST service, make MitoZoa a valuable resource for comparative and evolutionary analyses as well as a reference database to assist in the annotation of novel mtDNA sequences. MitoZoa is freely accessible at http://www.caspur.it/mitozoa.
Genome-wide studies of intron dynamics in mammalian orthologous genes have found convincing evidence for loss of introns but very little for intron turnover. Similarly, large-scale analysis of intron dynamics in a few vertebrate genomes has identified only intron losses and no gains, indicating that intron gain is an extremely rare event in vertebrate evolution. These studies suggest that the intron-rich genomes of vertebrates do not allow intron gain. The aim of this study was to search for evidence of de novo intron gain in domesticated genes from an analysis of their exon/intron structures.
A phylogenomic approach has been used to analyse all domesticated genes in mammals and chordates that originated from the coding parts of transposable elements. Gain of introns in domesticated genes has been reconstructed on well established mammalian, vertebrate and chordate phylogenies, and examined as to where and when the gain events occurred. The locations, sizes and amounts of de novo introns gained in the domesticated genes during the evolution of mammals and chordates has been analyzed. A significant amount of intron gain was found only in domesticated genes of placental mammals, where more than 70 cases were identified. De novo gained introns show clear positional bias, since they are distributed mainly in 5' UTR and coding regions, while 3' UTR introns are very rare. In the coding regions of some domesticated genes up to 8 de novo gained introns have been found. Intron densities in Eutheria-specific domesticated genes and in older domesticated genes that originated early in vertebrates are lower than those for normal mammalian and vertebrate genes. Surprisingly, the majority of intron gains have occurred in the ancestor of placentals.
This study provides the first evidence for numerous intron gains in the ancestor of placental mammals and demonstrates that adequate taxon sampling is crucial for reconstructing intron evolution. The findings of this comprehensive study slightly challenge the current view on the evolutionary stasis in intron dynamics during the last 100 - 200 My. Domesticated genes could constitute an excellent system on which to analyse the mechanisms of intron gain in placental mammals.
Reviewers: this article was reviewed by Dan Graur, Eugene V. Koonin and Jürgen Brosius.
Transposed elements (TEs) are mobile genetic sequences. During the evolution of eukaryotes TEs were inserted into active protein-coding genes, affecting gene structure, expression and splicing patterns, and protein sequences. Genomic insertions of TEs also led to creation and expression of new functional non-coding RNAs such as microRNAs. We have constructed the TranspoGene database, which covers TEs located inside protein-coding genes of seven species: human, mouse, chicken, zebrafish, fruit fly, nematode and sea squirt. TEs were classified according to location within the gene: proximal promoter TEs, exonized TEs (insertion within an intron that led to exon creation), exonic TEs (insertion into an existing exon) or intronic TEs. TranspoGene contains information regarding specific type and family of the TEs, genomic and mRNA location, sequence, supporting transcript accession and alignment to the TE consensus sequence. The database also contains host gene specific data: gene name, genomic location, Swiss-Prot and RefSeq accessions, diseases associated with the gene and splicing pattern. In addition, we created microTranspoGene: a database of human, mouse, zebrafish and nematode TE-derived microRNAs. The TranspoGene and microTranspoGene databases can be used by researchers interested in the effect of TE insertion on the eukaryotic transcriptome. Publicly available query interfaces to TranspoGene and microTranspoGene are available at http://transpogene.tau.ac.il/ and http://microtranspogene.tau.ac.il, respectively. The entire database can be downloaded as flat files.
Most eukaryotes have at least some genes interrupted by introns. While it is well
accepted that introns were already present at moderate density in the last
eukaryote common ancestor, the conspicuous diversity of intron density among
genomes suggests a complex evolutionary history, with marked differences between
phyla. The question of the rates of intron gains and loss in the course of
evolution and factors influencing them remains controversial. We have
investigated a single gene family, alpha-amylase, in 55 species covering a
variety of animal phyla. Comparison of intron positions across phyla suggests a
complex history, with a likely ancestral intronless gene undergoing frequent
intron loss and gain, leading to extant intron/exon structures that are highly
variable, even among species from the same phylum. Because introns are known to
play no regulatory role in this gene and there is no alternative splicing, the
structural differences may be interpreted more easily: intron positions, sizes,
losses or gains may be more likely related to factors linked to splicing
mechanisms and requirements, and to recognition of introns and exons, or to more
extrinsic factors, such as life cycle and population size. We have shown that
intron losses outnumbered gains in recent periods, but that “resets”
of intron positions occurred at the origin of several phyla, including
vertebrates. Rates of gain and loss appear to be positively correlated. No phase
preference was found. We also found evidence for parallel gains and for intron
sliding. Presence of introns at given positions was correlated to a strong
protosplice consensus sequence AG/G, which was much weaker in the absence of
intron. In contrast, recent intron insertions were not associated with a
specific sequence. In animal Amy genes, population size and
generation time seem to have played only minor roles in shaping gene
Protein-coding genes in eukaryotes are interrupted by introns, but intron densities widely differ between eukaryotic lineages. Vertebrates, some invertebrates and green plants have intron-rich genes, with 6–7 introns per kilobase of coding sequence, whereas most of the other eukaryotes have intron-poor genes. We reconstructed the history of intron gain and loss using a probabilistic Markov model (Markov Chain Monte Carlo, MCMC) on 245 orthologous genes from 99 genomes representing the three of the five supergroups of eukaryotes for which multiple genome sequences are available. Intron-rich ancestors are confidently reconstructed for each major group, with 53 to 74% of the human intron density inferred with 95% confidence for the Last Eukaryotic Common Ancestor (LECA). The results of the MCMC reconstruction are compared with the reconstructions obtained using Maximum Likelihood (ML) and Dollo parsimony methods. An excellent agreement between the MCMC and ML inferences is demonstrated whereas Dollo parsimony introduces a noticeable bias in the estimations, typically yielding lower ancestral intron densities than MCMC and ML. Evolution of eukaryotic genes was dominated by intron loss, with substantial gain only at the bases of several major branches including plants and animals. The highest intron density, 120 to 130% of the human value, is inferred for the last common ancestor of animals. The reconstruction shows that the entire line of descent from LECA to mammals was intron-rich, a state conducive to the evolution of alternative splicing.
In eukaryotes, protein-coding genes are interrupted by non-coding introns. The intron densities widely differ, from 6–7 introns per kilobase of coding sequence in vertebrates, some invertebrates and plants, to only a few introns across the entire genome in many unicellular forms. We applied a robust statistical methodology, Markov Chain Monte Carlo, to reconstruct the history of intron gain and loss throughout the evolution of eukaryotes using a set of 245 homologous genes from 99 genomes that represent the diversity of eukaryotes. Intron-rich ancestors were confidently inferred for each major eukaryotic group including 53% to 74% of the human intron density for the last eukaryotic common ancestor, and 120% to 130% of the human value for the last common ancestor of animals. Evolution of eukaryotic genes involved primarily intron loss, with substantial gain only at the bases of several major branches including plants and animals. Thus, the common ancestor of all extant eukaryotes was a complex organism with a gene architecture resembling those in multicellular organisms. The line of descent from the last common ancestor to mammals was an uninterrupted intron-rich state that, given the error-prone splicing in intron-rich organisms, was conducive to the elaboration of functional alternative splicing.
Alternative splicing of mutually exclusive exons is an important mechanism for increasing protein diversity in eukaryotes. The insect Mhc (myosin heavy chain) gene produces all different muscle myosins as a result of alternative splicing in contrast to most other organisms of the Metazoa lineage, that have a family of muscle genes with each gene coding for a protein specialized for a functional niche.
The muscle myosin heavy chain genes of 22 species of the Arthropoda ranging from the waterflea to wasp and Drosophila have been annotated. The analysis of the gene structures allowed the reconstruction of an ancient muscle myosin heavy chain gene and showed that during evolution of the arthropods introns have mainly been lost in these genes although intron gain might have happened in a few cases. Surprisingly, the genome of Aedes aegypti contains another and that of Culex pipiens quinquefasciatus two further muscle myosin heavy chain genes, called Mhc3 and Mhc4, that contain only one variant of the corresponding alternative exons of the Mhc1 gene. Mhc3 transcription in Aedes aegypti is documented by EST data. Mhc3 and Mhc4 inserted in the Aedes and Culex genomes either by gene duplication followed by the loss of all but one variant of the alternative exons, or by incorporation of a transcript of which all other variants have been spliced out retaining the exon-intron structure. The second and more likely possibility represents a new type of a 'partially' processed pseudogene.
Based on the comparative genomic analysis of the alternatively spliced arthropod muscle myosin heavy chain genes we propose that the splicing process operates sequentially on the transcript. The process consists of the splicing of the mutually exclusive exons until one exon out of the cluster remains while retaining surrounding intronic sequence. In a second step splicing of introns takes place. A related mechanism could be responsible for the splicing of other genes containing mutually exclusive exons.
The GMC oxidoreductases comprise a large family of diverse FAD enzymes that share a homologous backbone. The relationship and origin of the GMC oxidoreductase genes, however, was unknown. Recent sequencing of entire genomes has allowed for the evolutionary analysis of the GMC oxidoreductase family.
Although genes that encode enzyme families are rarely linked in higher eukaryotes, we discovered that the majority of the GMC oxidoreductase genes in the fruit fly (D. melanogaster), mosquito (A. gambiae), honeybee (A. mellifera), and flour beetle (T. castaneum) are located in a highly conserved cluster contained within a large intron of the flotillin-2 (Flo-2) gene. In contrast, the genomes of vertebrates and the nematode C. elegans contain few GMC genes and lack a GMC cluster, suggesting that the GMC cluster and the function of its resident genes are unique to insects or arthropods. We found that the development patterns of expression of the GMC cluster genes are highly complex. Among the GMC oxidoreductases located outside of the GMC gene cluster, the identities of two related enzymes, glucose dehydrogenase (GLD) and glucose oxidase (GOX), are known, and they play major roles in development and immunity. We have discovered that several additional GLD and GOX homologues exist in insects but are remotely similar to fungal GOX.
We speculate that the GMC oxidoreductase cluster has been conserved to coordinately regulate these genes for a common developmental or physiological function related to ecdysteroid metabolism. Furthermore, we propose that the GMC gene cluster may be the birthplace of the insect GMC oxidoreductase genes. Through tandem duplication and divergence within the cluster, new GMC genes evolved. Some of the GMC genes have been retained in the cluster for hundreds of millions of years while others might have transposed to other regions of the genome. Consistent with this hypothesis, our analysis indicates that insect GOX and GLD arose from a different ancestral GMC gene than that of fungal GOX.
An appreciable fraction of introns is thought to have some function, but there is no obvious way to predict which specific intron is likely to be functional. We hypothesize that functional introns experience a different selection regime than non-functional ones and will therefore show distinct evolutionary histories. In particular, we expect functional introns to be more resistant to loss, and that this would be reflected in high conservation of their position with respect to the coding sequence. To test this hypothesis, we focused on introns whose function comes about from microRNAs and snoRNAs that are embedded within their sequence. We built a data set of orthologous genes across 28 eukaryotic species, reconstructed the evolutionary histories of their introns and compared functional introns with the rest of the introns. We found that, indeed, the position of microRNA- and snoRNA-bearing introns is significantly more conserved. In addition, we found that both families of RNA genes settled within introns early during metazoan evolution. We identified several easily computable intronic properties that can be used to detect functional introns in general, thereby suggesting a new strategy to pinpoint non-coding cellular functions.
To analyse the evolutionary emergence of structural complexity in physical processes, we introduce a general, but tractable, model of objects that interact to produce new objects. Since the objects—ϵ-machines—have well-defined structural properties, we demonstrate that complexity in the resulting population dynamical system emerges on several distinct organizational scales during evolution—from individuals to nested levels of mutually self-sustaining interaction. The evolution to increased organization is dominated by the spontaneous creation of structural hierarchies and this, in turn, is facilitated by the innovation and maintenance of relatively low-complexity, but general individuals.
self-organization; evolution; population dynamics; autocatalysis; structure; hierarchy
Spliceosomal introns are an ancient, widespread hallmark of eukaryotic genomes. Despite much research, many questions regarding the origin and evolution of spliceosomal introns remain unsolved, partly due to the difficulty of inferring ancestral gene structures. We circumvent this problem by using genes originated by endosymbiotic gene transfer, in which an intron-less structure at the time of the transfer can be assumed.
By comparing the exon-intron structures of 64 mitochondrial-derived genes that were transferred to the nucleus at different evolutionary periods, we can trace the history of intron gains in different eukaryotic lineages. Our results show that the intron density of genes transferred relatively recently to the nuclear genome is similar to that of genes originated by more ancient transfers, indicating that gene structure can be rapidly shaped by intron gain after the integration of the gene into the genome and that this process is mainly determined by forces acting specifically on each lineage. We analyze 12 cases of mitochondrial-derived genes that have been transferred to the nucleus independently in more than one lineage.
Remarkably, the proportion of shared intron positions that were gained independently in homologous genes is similar to that proportion observed in genes that were transferred prior to the speciation event and whose shared intron positions might be due to vertical inheritance. A particular case of parallel intron gain in the nad7 gene is discussed in more detail.
Sequential polyandry may evolve as an insurance mechanism to reduce the risk of choosing a mate that is infertile, closely related, genetically inferior or incompatible, but polyandry also might insure against nest failure in unpredictable environments. Most animals are oviparous, and in species where males provide nest sites whose quality varies substantially and unpredictably, polyandrous females might insure offspring success by distributing their eggs across multiple nests. Here, we test this hypothesis in a wild population of an Australian terrestrial toadlet, a polyandrous species, where males construct nests and remain with broods. We found that females partitioned their eggs across the nests of two to eight males and that more polyandrous females gained a significant increase in mean offspring survivorship. Our results provide evidence for the most extreme case of sequential polyandry yet discovered in a vertebrate and also suggest that insurance against nest failure might favour the evolution of polyandry. We propose that insurance against nest failure might be widespread among oviparous taxa and provide an important explanation for the prevalence of sequential polyandry in nature.
sexual selection; polyandry; frog
Genome-based studies of metazoan evolution are most informative when phylogenetically diverse species are incorporated in the analysis. As such, evolutionary trends within and outside the phylum Nematoda have been less revealing by focusing only on comparisons involving Caenorhabditis elegans. Herein, we present a draft of the 64 megabase nuclear genome of Trichinella spiralis, containing 15,808 protein coding genes. This parasitic nematode is an extant member of a clade that diverged early in the evolution of the phylum enabling identification of archetypical genes and molecular signatures exclusive to nematodes. Comparative analyses support intrachromosomal rearrangements across the phylum, disproportionate numbers of protein family deaths over births in parasitic vs. a non-parasitic nematode, and a preponderance of gene loss and gain events in nematodes relative to Drosophila melanogaster. This sequence and the panphylum characteristics identified herein will advance evolutionary studies and strategies to combat global parasites of humans, food animals and crops.
All sequenced eukaryotic genomes have been shown to possess at least a few introns. This includes those unicellular organisms, which were previously suspected to be intron-less. Therefore, gene splicing must have been present at least in the last common ancestor of the eukaryotes. To explain the evolution of introns, basically two mutually exclusive concepts have been developed. The introns-early hypothesis says that already the very first protein-coding genes contained introns while the introns-late concept asserts that eukaryotic genes gained introns only after the emergence of the eukaryotic lineage. A very important aspect in this respect is the conservation of intron positions within homologous genes of different taxa.
GenePainter is a standalone application for mapping gene structure information onto protein multiple sequence alignments. Based on the multiple sequence alignments the gene structures are aligned down to single nucleotides. GenePainter accounts for variable lengths in exons and introns, respects split codons at intron junctions and is able to handle sequencing and assembly errors, which are possible reasons for frame-shifts in exons and gaps in genome assemblies. Thus, even gene structures of considerably divergent proteins can properly be compared, as it is needed in phylogenetic analyses. Conserved intron positions can also be mapped to user-provided protein structures. For their visualization GenePainter provides scripts for the molecular graphics system PyMol.
GenePainter is a tool to analyse gene structure conservation providing various visualization options. A stable version of GenePainter for all operating systems as well as documentation and example data are available at http://www.motorprotein.de/genepainter.html.
Exon; Intron; Gene structure; Evolution
Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as “texts” using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS) analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers) in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter - GDM) allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences.
Hatching enzyme, belonging to the astacin metallo-protease family, digests egg envelope at embryo hatching. Orthologous genes of the enzyme are found in all vertebrate genomes. Recently, we found that exon-intron structures of the genes were conserved among tetrapods, while the genes of teleosts frequently lost their introns. Occurrence of such intron losses in teleostean hatching enzyme genes is an uncommon evolutionary event, as most eukaryotic genes are generally known to be interrupted by introns and the intron insertion sites are conserved from species to species. Here, we report on extensive studies of the exon-intron structures of teleostean hatching enzyme genes for insight into how and why introns were lost during evolution.
We investigated the evolutionary pathway of intron-losses in hatching enzyme genes of 27 species of Teleostei. Hatching enzyme genes of basal teleosts are of only one type, which conserves the 9-exon-8-intron structure of an assumed ancestor. On the other hand, otocephalans and euteleosts possess two types of hatching enzyme genes, suggesting a gene duplication event in the common ancestor of otocephalans and euteleosts. The duplicated genes were classified into two clades, clades I and II, based on phylogenetic analysis. In otocephalans and euteleosts, clade I genes developed a phylogeny-specific structure, such as an 8-exon-7-intron, 5-exon-4-intron, 4-exon-3-intron or intron-less structure. In contrast to the clade I genes, the structures of clade II genes were relatively stable in their configuration, and were similar to that of the ancestral genes. Expression analyses revealed that hatching enzyme genes were high-expression genes, when compared to that of housekeeping genes. When expression levels were compared between clade I and II genes, clade I genes tends to be expressed more highly than clade II genes.
Hatching enzyme genes evolved to lose their introns, and the intron-loss events occurred at the specific points of teleostean phylogeny. We propose that the high-expression hatching enzyme genes frequently lost their introns during the evolution of teleosts, while the low-expression genes maintained the exon-intron structure of the ancestral gene.
A long-standing assumption in evolutionary biology is that the evolution rate of protein-coding genes depends, largely, on specific constraints that affect the function of the given protein. However, recent research in evolutionary systems biology revealed unexpected, significant correlations between evolution rate and characteristics of genes or proteins that are not directly related to specific protein functions, such as expression level and protein–protein interactions. The strongest connections were consistently detected between protein sequence evolution rate and the expression level of the respective gene. A recent genome-wide proteomic study revealed an extremely strong correlation between the abundances of orthologous proteins in distantly related animals, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster. We used the extensive protein abundance data from this study along with short-term evolutionary rates (ERs) of orthologous genes in nematodes and flies to estimate the relative contributions of structural–functional constraints and the translation rate to the evolution rate of protein-coding genes. Together the intrinsic constraints and translation rate account for approximately 50% of the variance of the ERs. The contribution of constraints is estimated to be 3- to 5-fold greater than the contribution of translation rate.
protein evolution; structural–functional constraints; misfolding; protein abundance
Mitochondria are descendants of the endosymbiotic α-proteobacterium most likely engulfed by the ancestral eukaryotic cells, and the proto-mitochondrial genome should have been severely streamlined in terms of both genome size and gene repertoire. In addition, mitochondrial (mt) sequence data indicated that frequent intron gain/loss events contributed to shaping the modern mt genome organizations, resulting in the homologous introns being shared between two distantly related mt genomes. Unfortunately, the bulk of mt sequence data currently available are of phylogenetically restricted lineages, i.e., metazoans, fungi, and land plants, and are insufficient to elucidate the entire picture of intron evolution in mt genomes. In this work, we sequenced a 12 kbp-fragment of the mt genome of the katablepharid Leucocryptos marina. Among nine protein-coding genes included in the mt genome fragment, the genes encoding cytochrome b and cytochrome c oxidase subunit I (cob and cox1) were interrupted by group I introns. We further identified that the cob and cox1 introns host open reading frames for homing endonucleases (HEs) belonging to distantly related superfamilies. Phylogenetic analyses recovered an affinity between the HE in the Leucocryptos cob intron and two green algal HEs, and that between the HE in the Leucocryptos cox1 intron and a fungal HE, suggesting that the Leucocryptos cob and cox1 introns possess distinct evolutionary origins. Although the current intron (and intronic HE) data are insufficient to infer how the homologous introns were distributed to distantly related mt genomes, the results presented here successfully expanded the evolutionary dynamism of group I introns in mt genomes.
The mechanisms and evolutionary dynamics of intron insertion and loss in eukaryotic genes remain poorly understood. Reconstruction of parsimonious scenarios of gene structure evolution in paralogous gene families in animals and plants revealed numerous gains and losses of introns. In all analyzed lineages, the number of acquired new introns was substantially greater than the number of lost ancestral introns. This trend held even for lineages in which vertical evolution of genes involved more intron losses than gains, suggesting that gene duplication boosts intron insertion. However, dating gene duplications and the associated intron gains and losses based on the molecular clock assumption showed that very few, if any, introns were gained during the last ∼100 million years of animal and plant evolution, in agreement with previous conclusions reached through analysis of orthologous gene sets. These results are generally compatible with the emerging notion of intensive insertion and loss of introns during transitional epochs in contrast to the relative quiet of the intervening evolutionary spans.