Several attributes intuitively considered to be typical mammalian features, such as complex behavior, live birth, and malignant diseases like cancer, also appeared several times independently in so-called “lower” vertebrates. The genetic mechanisms underlying the evolution of these elaborate traits are poorly understood. The platyfish, Xiphophorus maculatus, offers a unique model to better understand the molecular biology of such traits. Herein we detail sequencing of the platyfish genome. Integrating genome assembly with extensive genetic maps uncovered that fish, in contrast to mammals, exhibit an unexpected evolutionary stability of chromosomes. Genes associated with viviparity show signatures of positive selection identifying new putative functional domains and rare cases of parallel evolution. We also discovered that genes implicated in cognition possess an unexpected high rate of duplicate gene retention after the teleost genome duplication suggesting a hypothesis for the evolution of the great behavioral complexity in fish, which exceeds that in amphibians and reptiles.
We present the draft 273 Mb genome of the migratory monarch butterfly (Danaus plexippus) and a set of 16, 866 protein-coding genes. Orthology properties suggest that the Lepidoptera are the fastest evolving insect order yet examined. Compared to the silkmoth Bombyx mori, the monarch genome shares prominent similarity in orthology content, microsynteny, and protein family sizes. The monarch genome reveals: a vertebrate-like opsin whose existence in insects is widespread; a full repertoire of molecular components for the monarch circadian clockwork; all members of the juvenile hormone biosynthetic pathway whose regulation shows unexpected sexual dimorphism; additional molecular signatures of oriented flight behavior; microRNAs that are differentially expressed between summer and migratory butterflies; monarch-specific expansions of chemoreceptors potentially important for long-distance migration; and a variant of the sodium/potassium pump that underlies a valuable chemical defense mechanism. The monarch genome enhances our ability to better understand the genetic and molecular basis of long-distance migration.
Divergence within cis-regulatory sequences may contribute to the adaptive evolution of gene expression, but functional alleles in these regions are difficult to identify without abundant genomic resources. Among African cichlid fishes, the differential expression of seven opsin genes has produced adaptive differences in visual sensitivity. Quantitative genetic analysis suggests that cis-regulatory alleles near the SWS2-LWS opsins may contribute to this variation. Here, we sequence BACs containing the opsin genes of two cichlids, Oreochromis niloticus and Metriaclima zebra. We use phylogenetic footprinting and shadowing to examine divergence in conserved non-coding elements, promoter sequences, and 3'-UTRs surrounding each opsin in search of candidate cis-regulatory sequences that influence cichlid opsin expression.
We identified 20 conserved non-coding elements surrounding the opsins of cichlids and other teleosts, including one known enhancer and a retinal microRNA. Most conserved elements contained computationally-predicted binding sites that correspond to transcription factors that function in vertebrate opsin expression; O. niloticus and M. zebra were significantly divergent in two of these. Similarly, we found a large number of relevant transcription factor binding sites within each opsin's proximal promoter, and identified five opsins that were considerably divergent in both expression and the number of transcription factor binding sites shared between O. niloticus and M. zebra. We also found several microRNA target sites within the 3'-UTR of each opsin, including two 3'-UTRs that differ significantly between O. niloticus and M. zebra. Finally, we examined interspecific divergence among 18 phenotypically diverse cichlids from Lake Malawi for one conserved non-coding element, two 3'-UTRs, and five opsin proximal promoters. We found that all regions were highly conserved with some evidence of CRX transcription factor binding site turnover. We also found three SNPs within two opsin promoters and one non-coding element that had weak association with cichlid opsin expression.
This study is the first to systematically search the opsins of cichlids for putative cis-regulatory sequences. Although many putative regulatory regions are highly conserved across a large number of phenotypically diverse cichlids, we found at least nine divergent sequences that could contribute to opsin expression differences in cis and stand out as candidates for future functional analyses.
Despite considerable progress in our understanding of land plant phylogeny, several nodes in the green tree of life remain poorly resolved. Furthermore, the bulk of currently available data come from only a subset of major land plant clades. Here we examine early land plant evolution using complete plastome sequences including two previously unexamined and phylogenetically critical lineages. To better understand the evolution of land plants and their plastomes, we examined aligned nucleotide sequences, indels, gene and nucleotide composition, inversions, and gene order at the boundaries of the inverted repeats.
We present the plastome sequences of Equisetum arvense, a horsetail, and of Isoetes flaccida, a heterosporous lycophyte. Phylogenetic analysis of aligned nucleotides from 49 plastome genes from 43 taxa supported monophyly for the following clades: embryophytes (land plants), lycophytes, monilophytes (leptosporangiate ferns + Angiopteris evecta + Psilotum nudum + Equisetum arvense), and seed plants. Resolution among the four monilophyte lineages remained moderate, although nucleotide analyses suggested that P. nudum and E. arvense form a clade sister to A. evecta + leptosporangiate ferns. Results from phylogenetic analyses of nucleotides were consistent with the distribution of plastome gene rearrangements and with analysis of sequence gaps resulting from insertions and deletions (indels). We found one new indel and an inversion of a block of genes that unites the monilophytes.
Monophyly of monilophytes has been disputed on the basis of morphological and fossil evidence. In the context of a broad sampling of land plant data we find several new pieces of evidence for monilophyte monophyly. Results from this study demonstrate resolution among the four monilophytes lineages, albeit with moderate support; we posit a clade consisting of Equisetaceae and Psilotaceae that is sister to the "true ferns," including Marattiaceae.
Widespread sampling of vertebrates, which comprise the majority of published animal mitochondrial genomes, has led to the view that mitochondrial gene rearrangements are relatively rare, and that gene orders are typically stable across major taxonomic groups. In contrast, more limited sampling within the Phylum Mollusca has revealed an unusually high number of gene order arrangements. Here we provide evidence that the lability of the molluscan mitochondrial genome extends to the family level by describing extensive gene order changes that have occurred within the Vermetidae, a family of sessile marine gastropods that radiated from a basal caenogastropod stock during the Cenozoic Era.
Major mitochondrial gene rearrangements have occurred within this family at a scale unexpected for such an evolutionarily young group and unprecedented for any caenogastropod examined to date. We determined the complete mitochondrial genomes of four species (Dendropoma maximum, D. gregarium, Eualetes tulipa, and Thylacodes squamigerus) and the partial mitochondrial genomes of two others (Vermetus erectus and Thylaeodus sp.). Each of the six vermetid gastropods assayed possessed a unique gene order. In addition to the typical mitochondrial genome complement of 37 genes, additional tRNA genes were evident in D. gregarium (trnK) and Thylacodes squamigerus (trnV, trnLUUR). Three pseudogenes and additional tRNAs found within the genome of Thylacodes squamigerus provide evidence of a past duplication event in this taxon. Likewise, high sequence similarities between isoaccepting leucine tRNAs in Thylacodes, Eualetes, and Thylaeodus suggest that tRNA remolding has been rife within this family. While vermetids exhibit gene arrangements diagnostic of this family, they also share arrangements with littorinimorph caenogastropods, with which they have been linked based on sperm morphology and primary sequence-based phylogenies.
We have uncovered major changes in gene order within a family of caenogastropod molluscs that are indicative of a highly dynamic mitochondrial genome. Studies of mitochondrial genomes at such low taxonomic levels should help to illuminate the dynamics of gene order change, since the telltale vestiges of gene duplication, translocation, and remolding have not yet been erased entirely. Likewise, gene order characters may improve phylogenetic hypotheses at finer taxonomic levels than once anticipated and aid in investigating the conditions under which sequence-based phylogenies lack resolution or prove misleading.
Pythium ultimum is a ubiquitous oomycete plant pathogen responsible for a variety of diseases on a broad range of crop and ornamental species.
The P. ultimum genome (42.8 Mb) encodes 15,290 genes and has extensive sequence similarity and synteny with related Phytophthora species, including the potato blight pathogen Phytophthora infestans. Whole transcriptome sequencing revealed expression of 86% of genes, with detectable differential expression of suites of genes under abiotic stress and in the presence of a host. The predicted proteome includes a large repertoire of proteins involved in plant pathogen interactions, although, surprisingly, the P. ultimum genome does not encode any classical RXLR effectors and relatively few Crinkler genes in comparison to related phytopathogenic oomycetes. A lower number of enzymes involved in carbohydrate metabolism were present compared to Phytophthora species, with the notable absence of cutinases, suggesting a significant difference in virulence mechanisms between P. ultimum and more host-specific oomycete species. Although we observed a high degree of orthology with Phytophthora genomes, there were novel features of the P. ultimum proteome, including an expansion of genes involved in proteolysis and genes unique to Pythium. We identified a small gene family of cadherins, proteins involved in cell adhesion, the first report of these in a genome outside the metazoans.
Access to the P. ultimum genome has revealed not only core pathogenic mechanisms within the oomycetes but also lineage-specific genes associated with the alternative virulence and lifestyles found within the pythiaceous lineages compared to the Peronosporaceae.
Ecdysozoa is the recently recognized clade of molting animals that comprises the vast majority of extant animal species and the most important invertebrate model organisms—the fruit fly and the nematode worm. Evolutionary relationships within the ecdysozoans remain, however, unresolved, impairing the correct interpretation of comparative genomic studies. In particular, the affinities of the three Panarthropoda phyla (Arthropoda, Onychophora, and Tardigrada) and the position of Myriapoda within Arthropoda (Mandibulata vs. Myriochelata hypothesis) are among the most contentious issues in animal phylogenetics.
To elucidate these relationships, we have determined and analyzed complete or nearly complete mitochondrial genome sequences of two Tardigrada, Hypsibius dujardini and Thulinia sp. (the first genomes to date for this phylum); one Priapulida, Halicryptus spinulosus; and two Onychophora, Peripatoides sp. and Epiperipatus biolleyi; and a partial mitochondrial genome sequence of the Onychophora Euperipatoides kanagrensis. Tardigrada mitochondrial genomes resemble those of the arthropods in term of the gene order and strand asymmetry, whereas Onychophora genomes are characterized by numerous gene order rearrangements and strand asymmetry variations. In addition, Onychophora genomes are extremely enriched in A and T nucleotides, whereas Priapulida and Tardigrada are more balanced.
Phylogenetic analyses based on concatenated amino acid coding sequences support a monophyletic origin of the Ecdysozoa and the position of Priapulida as the sister group of a monophyletic Panarthropoda (Tardigrada plus Onychophora plus Arthropoda). The position of Tardigrada is more problematic, most likely because of long branch attraction (LBA). However, experiments designed to reduce LBA suggest that the most likely placement of Tardigrada is as a sister group of Onychophora. The same analyses also recover monophyly of traditionally recognized arthropod lineages such as Arachnida and of the highly debated clade Mandibulata.
Arthropoda; mitogenomics; long branch attraction; Tardigrada; phylogeny; strand asymmetry
Tortula ruralis, a widely distributed species in the moss family Pottiaceae, is increasingly used as a model organism for the study of desiccation tolerance and mechanisms of cellular repair. In this paper, we present the chloroplast genome sequence of T. ruralis, only the second published chloroplast genome for a moss, and the first for a vegetatively desiccation-tolerant plant.
The Tortula chloroplast genome is ~123,500 bp, and differs in a number of ways from that of Physcomitrella patens, the first published moss chloroplast genome. For example, Tortula lacks the ~71 kb inversion found in the large single copy region of the Physcomitrella genome and other members of the Funariales. Also, the Tortula chloroplast genome lacks petN, a gene found in all known land plant plastid genomes. In addition, an unusual case of nucleotide polymorphism was discovered.
Although the chloroplast genome of Tortula ruralis differs from that of the only other sequenced moss, Physcomitrella patens, we have yet to determine the biological significance of the differences. The polymorphisms we have uncovered in the sequencing of the genome offer a rare possibility (for mosses) of the generation of DNA markers for fine-level phylogenetic studies, or to investigate individual variation within populations.
Plastid genomes of the grasses (Poaceae) are unusual in their organization and rates of sequence evolution. There has been a recent surge in the availability of grass plastid genome sequences, but a comprehensive comparative analysis of genome evolution has not been performed that includes any related families in the Poales. We report on the plastid genome of Typha latifolia, the first non-grass Poales sequenced to date, and we present comparisons of genome organization and sequence evolution within Poales. Our results confirm that grass plastid genomes exhibit acceleration in both genomic rearrangements and nucleotide substitutions. Poaceae have multiple structural rearrangements, including three inversions, three genes losses (accD, ycf1, ycf2), intron losses in two genes (clpP, rpoC1), and expansion of the inverted repeat (IR) into both large and small single-copy regions. These rearrangements are restricted to the Poaceae, and IR expansion into the small single-copy region correlates with the phylogeny of the family. Comparisons of 73 protein-coding genes for 47 angiosperms including nine Poaceae genera confirm that the branch leading to Poaceae has significantly accelerated rates of change relative to other monocots and angiosperms. Furthermore, rates of sequence evolution within grasses are lower, indicating a deceleration during diversification of the family. Overall there is a strong correlation between accelerated rates of genomic rearrangements and nucleotide substitutions in Poaceae, a phenomenon that has been noted recently throughout angiosperms. The cause of the correlation is unknown, but faulty DNA repair has been suggested in other systems including bacterial and animal mitochondrial genomes.
Electronic supplementary material
The online version of this article (doi:10.1007/s00239-009-9317-3) contains supplementary material, which is available to authorized users.
Plastid genomics; Molecular evolution; Poales; Poaceae; Grass genomes; Typha latifolia
Many species of stalk-eyed flies (Diopsidae) possess highly-exaggerated, sexually dimorphic eye-stalks that play an important role in the mating system of these flies. Eye-stalks are increasingly being used as a model system for studying sexual selection, but little is known about the genetic mechanisms producing variation in these ornamental traits. Therefore, we constructed an EST database of genes expressed in the developing eye-antennal imaginal disc of the highly dimorphic species Teleopsis dalmanni. We used this set of genes to construct microarray slides and compare patterns of gene expression between lines of flies with divergent eyespan.
We generated 33,229 high-quality ESTs from three non-normalized libraries made from the developing eye-stalk tissue at different developmental stages. EST assembly and annotation produced a total of 7,066 clusters comprising 3,424 unique genes with significant sequence similarity to a protein in either Drosophila melanogaster or Anopheles gambiae. Comparisons of the transcript profiles at different stages reveal a developmental shift in relative expression from genes involved in anatomical structure formation, transcription, and cell proliferation at the larval stage to genes involved in neurological processes and cuticle production during the pupal stages. Based on alignments of the EST fragments to homologous sequences in Drosophila and Anopheles, we identified 20 putative gene duplication events in T. dalmanni and numerous genes undergoing significantly faster rates of evolution in T. dalmanni relative to the other Dipteran species. Microarray experiments identified over 350 genes with significant differential expression between flies from lines selected for high and low relative eyespan but did not reveal any primary biological process or pathway that is driving the expression differences.
The catalogue of genes identified in the EST database provides a valuable framework for a comprehensive examination of the genetic basis of eye-stalk variation. Several candidate genes, such as crooked legs, cdc2, CG31917 and CG11577, emerge from the analysis of gene duplication, protein evolution and microarray gene expression. Additional comparisons of expression profiles between, for example, males and females, and species that differ in eye-stalk sexual dimorphism, are now enabled by these resources.
Plastid genome content and arrangement are highly conserved across most land plants and their closest relatives, streptophyte algae, with nearly all plastid introns having invaded the genome in their common ancestor at least 450 million years ago. One such intron, within the transfer RNA trnK-UUU, contains a large open reading frame that encodes a presumed intron maturase, matK. This gene is missing from the plastid genomes of two species in the parasitic plant genus Cuscuta but is found in all other published land plant and streptophyte algal plastid genomes, including that of the nonphotosynthetic angiosperm Epifagus virginiana and two other species of Cuscuta. By examining matK and plastid intron distribution in Cuscuta, we add support to the hypothesis that its normal role is in splicing seven of the eight group IIA introns in the genome. We also analyze matK nucleotide sequences from Cuscuta species and relatives that retain matK to test whether changes in selective pressure in the maturase are associated with intron deletion. Stepwise loss of most group IIA introns from the plastid genome results in substantial change in selective pressure within the hypothetical RNA-binding domain of matK in both Cuscuta and Epifagus, either through evolution from a generalist to a specialist intron splicer or due to loss of a particular intron responsible for most of the constraint on the binding region. The possibility of intron-specific specialization in the X-domain is implicated by evidence of positive selection on the lineage leading to C. nitida in association with the loss of six of seven introns putatively spliced by matK. Moreover, transfer RNA gene deletion facilitated by parasitism combined with an unusually high rate of intron loss from remaining functional plastid genes created a unique circumstance on the lineage leading to Cuscuta subgenus Grammica that allowed elimination of matK in the most species-rich lineage of Cuscuta.
Within the salamander family Plethodontidae, five different clades have evolved high levels of enucleated red blood cells, which are extremely unusual among non-mammalian vertebrates. In each of these five clades, the salamanders have large genomes and miniaturized or attenuated body forms. Such a correlation suggests that the loss of nuclei in red blood cells may be related, in part, to the interaction between large genome size and small body size, which has been shown to have profound morphological consequences for the nervous and visual systems in plethodontids. Previous work has demonstrated that variation in both the level of enucleated cells and the size of the nuclear genome exists among species of the monophyletic plethodontid genus Batrachoseps. Here, we report extensive intraspecific variation in levels of enucleated red blood cells in 15 species and provide measurements of red blood cell size, nucleus size, and genome size for 13 species of Batrachoseps. We present a new phylogenetic hypothesis for the genus based on 6,150 bp of mitochondrial DNA sequence data from nine exemplar taxa and use it to examine the relationship between genome size and enucleated red blood cell morphology in a phylogenetic framework. Our analyses demonstrate positive direct correlations between genome size, nucleus size, and both nucleated and enucleated cell sizes within Batrachoseps, although only the relationship between genome size and nucleus size is significant when phylogenetically independent contrasts are used. In light of our results and broader studies of comparative hematology, we propose that high levels of enucleated, variably sized red blood cells in Batrachoseps may have evolved in response to rheological problems associated with the circulation of large red blood cells containing large, bulky nuclei in an attenuate organism.
Batrachoseps; Plethodontidae; Nucleus; Miniaturization; Red blood cells
The first whole genomes to be compared for phylogenetic inference were those of mitochondria, which provided the first sets of genome-level characters for phylogenetic reconstruction. Most powerful among these characters has been the comparisons of the relative arrangements of genes, which has convincingly resolved numerous branch points, including those that had remained recalcitrant even to very large molecular sequence comparisons. Now the world faces a tsunami of complete nuclear genome sequences. In addition to the tremendous amount of DNA sequence that is becoming available for comparison, there is also a potential for many more genome-level characters to be developed, including the relative positions of introns, the domain structures of proteins, gene family membership, the presence of particular biochemical pathways, aspects of DNA replication or transcription, and many others. These characters can be especially convincing owing to their low likelihood of reverting to a primitive condition or occurring independently in separate lineages, thereby reducing the occurrence of homoplasy. The comparisons of organelle genomes pioneered the way for using such features for phylogenetic reconstructions, and it is almost certainly true, as ever more genomic sequence becomes available, that further use of genome-level characters will play a big role in outlining the relationships among major animal groups.
genome; evolution; phylogeny; phylogenetically inferred groups; genome-level characters; gene family
With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the ‘transparency’ of the information contained in existing genomic databases.
Welwitschia mirabilis is the only extant member of the family Welwitschiaceae, one of three lineages of gnetophytes, an enigmatic group of gymnosperms variously allied with flowering plants or conifers. Limited sequence data and rapid divergence rates have precluded consensus on the evolutionary placement of gnetophytes based on molecular characters. Here we report on the first complete gnetophyte chloroplast genome sequence, from Welwitschia mirabilis, as well as analyses on divergence rates of protein-coding genes, comparisons of gene content and order, and phylogenetic implications.
The chloroplast genome of Welwitschia mirabilis [GenBank: EU342371] is comprised of 119,726 base pairs and exhibits large and small single copy regions and two copies of the large inverted repeat (IR). Only 101 unique gene species are encoded. The Welwitschia plastome is the most compact photosynthetic land plant plastome sequenced to date; 66% of the sequence codes for product. The genome also exhibits a slightly expanded IR, a minimum of 9 inversions that modify gene order, and 19 genes that are lost or present as pseudogenes. Phylogenetic analyses, including one representative of each extant seed plant lineage and based on 57 concatenated protein-coding sequences, place Welwitschia at the base of all seed plants (distance, maximum parsimony) or as the sister to Pinus (the only conifer representative) in a monophyletic gymnosperm clade (maximum likelihood, bayesian). Relative rate tests on these gene sequences show the Welwitschia sequences to be evolving at faster rates than other seed plants. For these genes individually, a comparison of average pairwise distances indicates that relative divergence in Welwitschia ranges from amounts about equal to other seed plants to amounts almost three times greater than the average for non-gnetophyte seed plants.
Although the basic organization of the Welwitschia plastome is typical, its compactness, gene content and high nucleotide divergence rates are atypical. The current lack of additional conifer plastome sequences precludes any discrimination between the gnetifer and gnepine hypotheses of seed plant relationships. However, both phylogenetic analyses and shared genome features identified here are consistent with either of the hypotheses that link gnetophytes with conifers, but are inconsistent with the anthophyte hypothesis.
Group II introns are ribozymes, removing themselves from their primary transcripts, as well as mobile genetic elements, transposing via an RNA intermediate, and are thought to be the ancestors of spliceosomal introns. Although common in bacteria and most eukaryotic organelles, they have never been reported in any bilaterian animal genome, organellar or nuclear. Here we report the first group II intron found in the mitochondrial genome of a bilaterian worm. This location is especially surprising, since animal mitochondrial genomes are generally distinct from those of plants, fungi, and protists by being small and compact, and so are viewed as being highly streamlined, perhaps as a result of strong selective pressures for fast replication while establishing germ plasm during early development. This intron is found in the mtDNA of an annelid worm, (an undescribed species of Nephtys), where the complete sequence revealed a 1819 bp group II intron inside the cox1 gene. We infer that this intron is the result of a recent horizontal gene transfer event from a viral or bacterial vector into the mitochondrial genome of Nephtys sp. Our findings hold implications for understanding mechanisms, constraints, and selective pressures that account for patterns of animal mitochondrial genome evolution
The genus Cuscuta L. (Convolvulaceae), commonly known as dodders, are epiphytic vines that invade the stems of their host with haustorial feeding structures at the points of contact. Although they lack expanded leaves, some species are noticeably chlorophyllous, especially as seedlings and in maturing fruits. Some species are reported as crop pests of worldwide distribution, whereas others are extremely rare and have local distributions and apparent niche specificity. A strong phylogenetic framework for this large genus is essential to understand the interesting ecological, morphological and molecular phenomena that occur within these parasites in an evolutionary context.
Here we present a well-supported phylogeny of Cuscuta using sequences of the nuclear ribosomal internal transcribed spacer and plastid rps2, rbcL and matK from representatives across most of the taxonomic diversity of the genus. We use the phylogeny to interpret morphological and plastid genome evolution within the genus. At least three currently recognized taxonomic sections are not monophyletic and subgenus Cuscuta is unequivocally paraphyletic. Plastid genes are extremely variable with regards to evolutionary constraint, with rbcL exhibiting even higher levels of purifying selection in Cuscuta than photosynthetic relatives. Nuclear genome size is highly variable within Cuscuta, particularly within subgenus Grammica, and in some cases may indicate the existence of cryptic species in this large clade of morphologically similar species.
Some morphological characters traditionally used to define major taxonomic splits within Cuscuta are homoplastic and are of limited use in defining true evolutionary groups. Chloroplast genome evolution seems to have evolved in a punctuated fashion, with episodes of loss involving suites of genes or tRNAs followed by stabilization of gene content in major clades. Nearly all species of Cuscuta retain some photosynthetic ability, most likely for nutrient apportionment to their seeds, while complete loss of photosynthesis and possible loss of the entire chloroplast genome is limited to a single small clade of outcrossing species found primarily in western South America.
Plastid genome content and protein sequence are highly conserved across land plants and their closest algal relatives. Parasitic plants, which obtain some or all of their nutrition through an attachment to a host plant, are often a striking exception. Heterotrophy can lead to relaxed constraint on some plastid genes or even total gene loss. We sequenced plastid genomes of two species in the parasitic genus Cuscuta along with a non-parasitic relative, Ipomoea purpurea, to investigate changes in the plastid genome that may result from transition to the parasitic lifestyle.
Aside from loss of all ndh genes, Cuscuta exaltata retains photosynthetic and photorespiratory genes that evolve under strong selective constraint. Cuscuta obtusiflora has incurred substantially more change to its plastid genome, including loss of all genes for the plastid-encoded RNA polymerase. Despite extensive change in gene content and greatly increased rate of overall nucleotide substitution, C. obtusiflora also retains all photosynthetic and photorespiratory genes with only one minor exception.
Although Epifagus virginiana, the only other parasitic plant with its plastid genome sequenced to date, has lost a largely overlapping set of transfer-RNA and ribosomal genes as Cuscuta, it has lost all genes related to photosynthesis and maintains a set of genes which are among the most divergent in Cuscuta. Analyses demonstrate photosynthetic genes are under the highest constraint of any genes within the plastid genomes of Cuscuta, indicating a function involving RuBisCo and electron transport through photosystems is still the primary reason for retention of the plastid genome in these species.
Teleost fish have seven paralogous clusters of Hox genes stemming from two complete genome duplications early in vertebrate evolution, and an additional genome duplication during the evolution of ray-finned fish, followed by the secondary loss of one cluster. Gene duplications on the one hand, and the evolution of regulatory sequences on the other, are thought to be among the most important mechanisms for the evolution of new gene functions. Cichlid fish, the largest family of vertebrates with about 2500 species, are famous examples of speciation and morphological diversity. Since this diversity could be based on regulatory changes, we chose to study the coding as well as putative regulatory regions of their Hox clusters within a comparative genomic framework.
We sequenced and characterized all seven Hox clusters of Astatotilapia burtoni, a haplochromine cichlid fish. Comparative analyses with data from other teleost fish such as zebrafish, two species of pufferfish, stickleback and medaka were performed. We traced losses of genes and microRNAs of Hox clusters, the medaka lineage seems to have lost more microRNAs than the other fish lineages. We found that each teleost genome studied so far has a unique set of Hox genes. The hoxb7a gene was lost independently several times during teleost evolution, the most recent event being within the radiation of East African cichlid fish. The conserved non-coding sequences (CNS) encompass a surprisingly large part of the clusters, especially in the HoxAa, HoxCa, and HoxDa clusters. Across all clusters, we observe a trend towards an increased content of CNS towards the anterior end.
The gene content of Hox clusters in teleost fishes is more variable than expected, with each species studied so far having a different set. Although the highest loss rate of Hox genes occurred immediately after whole genome duplications, our analyses showed that gene loss continued and is still ongoing in all teleost lineages. Along with the gene content, the CNS content also varies across clusters. The excess of CNS at the anterior end of clusters could imply a stronger conservation of anterior expression patters than those towards more posterior areas of the embryo.
The number of completely sequenced plastid genomes available is growing rapidly. This array of sequences presents new opportunities to perform comparative analyses. In comparative studies, it is often useful to compare across wide phylogenetic spans and, within angiosperms, to include representatives from basally diverging lineages such as the genomes reported here: Nuphar advena (from a basal-most lineage) and Ranunculus macranthus (a basal eudicot). We report these two new plastid genome sequences and make comparisons (within angiosperms, seed plants, or all photosynthetic lineages) to evaluate features such as the status of ycf15 and ycf68 as protein coding genes, the distribution of simple sequence repeats (SSRs) and longer dispersed repeats (SDR), and patterns of nucleotide composition.
The Nuphar [GenBank:NC_008788] and Ranunculus [GenBank:NC_008796] plastid genomes share characteristics of gene content and organization with many other chloroplast genomes. Like other plastid genomes, these genomes are A+T-rich, except for rRNA and tRNA genes. Detailed comparisons of Nuphar with Nymphaea, another Nymphaeaceae, show that more than two-thirds of these genomes exhibit at least 95% sequence identity and that most SSRs are shared. In broader comparisons, SSRs vary among genomes in terms of abundance and length and most contain repeat motifs based on A and T nucleotides.
SSR and SDR abundance varies by genome and, for SSRs, is proportional to genome size. Long SDRs are rare in the genomes assessed. SSRs occur less frequently than predicted and, although the majority of the repeat motifs do include A and T nucleotides, the A+T bias in SSRs is less than that predicted from the underlying genomic nucleotide composition. In codon usage third positions show an A+T bias, however variation in codon usage does not correlate with differences in A+T-richness. Thus, although plastome nucleotide composition shows "A+T richness", an A+T bias is not apparent upon more in-depth analysis, at least in these aspects. The pattern of evolution in the sequences identified as ycf15 and ycf68 is not consistent with them being protein-coding genes. In fact, these regions show no evidence of sequence conservation beyond what is normal for non-coding regions of the IR.
The magnoliids with four orders, 19 families, and 8,500 species represent one of the largest clades of early diverging angiosperms. Although several recent angiosperm phylogenetic analyses supported the monophyly of magnoliids and suggested relationships among the orders, the limited number of genes examined resulted in only weak support, and these issues remain controversial. Furthermore, considerable incongruence resulted in phylogenetic reconstructions supporting three different sets of relationships among magnoliids and the two large angiosperm clades, monocots and eudicots. We sequenced the plastid genomes of three magnoliids, Drimys (Canellales), Liriodendron (Magnoliales), and Piper (Piperales), and used these data in combination with 32 other angiosperm plastid genomes to assess phylogenetic relationships among magnoliids and to examine patterns of variation of GC content.
The Drimys, Liriodendron, and Piper plastid genomes are very similar in size at 160,604, 159,886 bp, and 160,624 bp, respectively. Gene content and order are nearly identical to many other unrearranged angiosperm plastid genomes, including Calycanthus, the other published magnoliid genome. Overall GC content ranges from 34–39%, and coding regions have a substantially higher GC content than non-coding regions. Among protein-coding genes, GC content varies by codon position with 1st codon > 2nd codon > 3rd codon, and it varies by functional group with photosynthetic genes having the highest percentage and NADH genes the lowest. Phylogenetic analyses using parsimony and likelihood methods and sequences of 61 protein-coding genes provided strong support for the monophyly of magnoliids and two strongly supported groups were identified, the Canellales/Piperales and the Laurales/Magnoliales. Strong support is reported for monocots and eudicots as sister clades with magnoliids diverging before the monocot-eudicot split. The trees also provided moderate or strong support for the position of Amborella as sister to a clade including all other angiosperms.
Evolutionary comparisons of three new magnoliid plastid genome sequences, combined with other published angiosperm genomes, confirm that GC content is unevenly distributed across the genome by location, codon position, and functional group. Furthermore, phylogenetic analyses provide the strongest support so far for the hypothesis that the magnoliids are sister to a large clade that includes both monocots and eudicots.
Mitochondria contain small genomes that are physically separate from those of nuclei. Their comparison serves as a model system for understanding the processes of genome evolution. Although complete mitochondrial genome sequences have been reported for more than 600 animals, the taxonomic sampling is highly biased toward vertebrates and arthropods, leaving much of the diversity yet uncharacterized.
The mitochondrial genome of the bellybutton nautilus, Nautilus macromphalus, a cephalopod mollusk, is 16,258 nts in length and 59.5% A+T, both values that are typical of animal mitochondrial genomes. It contains the 37 genes that are almost universally found in animal mtDNAs, with 15 on one DNA strand and 22 on the other. The arrangement of these genes can be derived from that of the distantly related Katharina tunicata (Mollusca: Polyplacophora) by a switch in position of two large blocks of genes and transpositions of four tRNA genes. There is strong skew in the distribution of nucleotides between the two strands, and analysis of this yields insight into modes of transcription and replication. There is an unusual number of non-coding regions and their function, if any, is not known; however, several of these demark abrupt shifts in nucleotide skew, and there are several identical sequence elements at these junctions, suggesting that they may play roles in transcription and/or replication. One of the non-coding regions contains multiple repeats of a tRNA-like sequence. Some of the tRNA genes appear to overlap on the same strand, but this could be resolved if the polycistron were cleaved at the beginning of the downstream gene, followed by polyadenylation of the product of the upstream gene to form a fully paired structure.
Nautilus macromphalus mtDNA contains an expected gene content that has experienced few rearrangements since the evolutionary split between cephalopods and polyplacophorans. It contains an unusual number of non-coding regions, especially considering that these otherwise often are generated by the same processes that produce gene rearrangements. The skew in nucleotide composition between the two strands is strong and associated with the direction of transcription in various parts of the genomes, but a comparison with K. tunicata implies that mutational bias during replication also plays a role. This appears to be yet another case where polyadenylation of mitochondrial tRNAs restores what would otherwise be an incomplete structure.
We present here the PhIGs database, a phylogenomic resource for sequenced genomes. Although many methods exist for clustering gene families, very few attempt to create truly orthologous clusters sharing descent from a single ancestral gene across a range of evolutionary depths. Although these non-phylogenetic gene family clusters have been used broadly for gene annotation, errors are known to be introduced by the artifactual association of slowly evolving paralogs and lack of annotation for those more rapidly evolving. A full phylogenetic framework is necessary for accurate inference of function and for many studies that address pattern and mechanism of the evolution of the genome. The automated generation of evolutionary gene clusters, creation of gene trees, determination of orthology and paralogy relationships, and the correlation of this information with gene annotations, expression information, and genomic context is an important resource to the scientific community.
The PhIGs database currently contains 23 completely sequenced genomes of fungi and metazoans, containing 409,653 genes that have been grouped into 42,645 gene clusters. Each gene cluster is built such that the gene sequence distances are consistent with the known organismal relationships and in so doing, maximizing the likelihood for the clusters to represent truly orthologous genes. The PhIGs website contains tools that allow the study of genes within their phylogenetic framework through keyword searches on annotations, such as GO and InterPro assignments, and sequence similarity searches by BLAST and HMM. In addition to displaying the evolutionary relationships of the genes in each cluster, the website also allows users to view the relative physical positions of homologous genes in specified sets of genomes.
Accurate analyses of genes and genomes can only be done within their full phylogenetic context. The PhIGs database and corresponding website address this problem for the scientific community. Our goal is to expand the content as more genomes are sequenced and use this framework to incorporate more analyses.
Pentastomids are a small group of vermiform animals with unique morphology and parasitic lifestyle. They are generally recognized as being related to the Arthropoda; however, the nature of this relationship is controversial. We have determined the complete sequence of the mitochondrial DNA (mtDNA) of the pentastomid Armillifer armillatus and complete or nearly complete mtDNA sequences from representatives of four previously unsampled groups of Crustacea: Remipedia (Speleonectes tulumensis), Cephalocarida (Hutchinsoniella macracantha), Cirripedia (Pollicipes polymerus) and Branchiura (Argulus americanus). Analyses of the mtDNA gene arrangements and sequences determined in this study indicate unambiguously that pentastomids are a group of modified crustaceans probably related to branchiurans. In addition, gene arrangement comparisons strongly support an unforeseen assemblage of pentastomids with maxillopod and cephalocarid crustaceans, to the exclusion of remipedes, branchiopods, malacostracans and hexapods.
The big-headed turtle (Platysternon megacephalum) from east Asia is the sole living representative of a poorly-studied turtle lineage (Platysternidae). It has no close living relatives, and its phylogenetic position within turtles is one of the outstanding controversies in turtle systematics. Platysternon was traditionally considered to be close to snapping turtles (Chelydridae) based on some studies of its morphology and mitochondrial (mt) DNA, however, other studies of morphology and nuclear (nu) DNA do not support that hypothesis.
We sequenced the complete mt genome of Platysternon and the nearly complete mt genomes of two other relevant turtles and compared them to turtle mt genomes from the literature to form the largest molecular dataset used to date to address this issue. The resulting phylogeny robustly rejects the placement of Platysternon with Chelydridae, but instead shows that it is a member of the Testudinoidea, a diverse, nearly globally-distributed group that includes pond turtles and tortoises. We also discovered that Platysternon mtDNA has large-scale gene rearrangements and possesses two, nearly identical, control regions, features that distinguish it from all other studied turtles.
Our study robustly determines the phylogenetic placement of Platysternon and provides a well-resolved outline of major turtle lineages, while demonstrating the significantly greater resolving power of comparing large amounts of mt sequence over that of short fragments. Earlier phylogenies placing Platysternon with chelydrids required a temporal gap in the fossil record that is now unnecessary. The duplicated control regions and gene rearrangements of the Platysternon mtDNA probably resulted from the duplication of part of the genome and then the subsequent loss of redundant genes. Although it is possible that having two control regions may provide some advantage, explaining why the control regions would be maintained while some of the duplicated genes were eroded, examples of this are rare. So far, duplicated control regions have been reported for mt genomes from just 12 clades of metazoans, including Platysternon.