Large-scale evolutionary studies often require the automated construction of alignments of a large number of homologous gene families. The majority of eukaryotic genes can produce different transcripts due to alternative splicing or transcription initiation, and many such transcripts encode different protein isoforms. As analyses tend to be gene centered, one single-protein isoform per gene is selected for the alignment, with the de facto approach being to use the longest protein isoform per gene (Longest), presumably to avoid including partial sequences and to maximize sequence information. Here, we show that this approach is problematic because it increases the number of indels in the alignments due to the inclusion of nonhomologous regions, such as those derived from species-specific exons, increasing the number of misaligned positions. With the aim of ameliorating this problem, we have developed a novel heuristic, Protein ALignment Optimizer (PALO), which, for each gene family, selects the combination of protein isoforms that are most similar in length. We examine several evolutionary parameters inferred from alignments in which the only difference is the method used to select the protein isoform combination: Longest, PALO, the combination that results in the highest sequence conservation, and a randomly selected combination. We observe that Longest tends to overestimate both nonsynonymous and synonymous substitution rates when compared with PALO, which is most likely due to an excess of misaligned positions. The estimation of the fraction of genes that have experienced positive selection by maximum likelihood is very sensitive to the method of isoform selection employed, both when alignments are constructed with MAFFT and with Prank+F. Longest performs better than a random combination but still estimates up to 3 times more positively selected genes than the combination showing the highest conservation, indicating the presence of many false positives. We show that PALO can eliminate the majority of such false positives and thus that it is a more appropriate approach for large-scale analyses than Longest. A web server has been set up to facilitate the use of PALO given a user-defined set of gene families; it is available at http://evolutionarygenomics.imim.es/palo.
protein isoform; alternative splicing; alignment; evolutionary rate; positive selection
Whole-genome duplications (WGDs) have recurred in the evolution of angiosperms, resulting in many duplicated chromosomal segments. Local gene duplications are also widespread in angiosperms. WGD-derived duplicates, that is, ohnologs, and local duplicates often show contrasting patterns of gene retention and evolution. However, many genes in angiosperms underwent multiple gene duplication events, possibly by different modes, indicating that different modes of gene duplication are not mutually exclusive. In two representative angiosperm genomes, Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa), we found that 9.6% and 11.3% of unique ohnologs, corresponding to 15.5% and 17.1% of ohnolog pairs, were also involved in local duplications, respectively. Locally duplicated ohnologs are widely distributed in different duplicated chromosomal segments and functionally biased. Coding sequence divergence between duplicated genes is denoted by nonsynonymous (Ka) and synonymous (Ks) substitution rates. Locally duplicated ohnolog pairs tend to have higher Ka, Ka/Ks, and gene expression divergence than nonlocally duplicated ohnolog pairs. Locally duplicated ohnologs also tend to have higher interspecies sequence divergence. These observations indicate that locally duplicated ohnologs evolve faster than nonlocally duplicated ohnologs. This study highlights the necessity to take local duplications into account when analyzing the evolutionary dynamics of ohnologs.
local gene duplication; whole-genome duplication; ohnolog; divergence; colinearity
We generated a genome-wide replication profile in the genome of Lachancea kluyveri and assessed the relationship between replication and base composition. This species diverged from Saccharomyces cerevisiae before the ancestral whole genome duplication. The genome comprises eight chromosomes among which a chromosomal arm of 1 Mb has a G + C-content much higher than the rest of the genome. We identified 252 active replication origins in L. kluyveri and found considerable divergence in origin location with S. cerevisiae and with Lachancea waltii. Although some global features of S. cerevisiae replication are conserved: Centromeres replicate early, whereas telomeres replicate late, we found that replication origins both in L. kluyveri and L. waltii do not behave as evolutionary fragile sites. In L. kluyveri, replication timing along chromosomes alternates between regions of early and late activating origins, except for the 1 Mb GC-rich chromosomal arm. This chromosomal arm contains an origin consensus motif different from other chromosomes and is replicated early during S-phase. We showed that precocious replication results from the specific absence of late firing origins in this chromosomal arm. In addition, we found a correlation between GC-content and distance from replication origins as well as a lack of replication-associated compositional skew between leading and lagging strands specifically in this GC-rich chromosomal arm. These findings suggest that the unusual base composition in the genome of L. kluyveri could be linked to replication.
Lachancea kluyveri; Saccharomyces cerevisiae; replication; ACS; GC content; GC skew
Many insect species have established long-term symbiotic relationships with intracellular bacteria. Symbiosis with bacteria has provided insects with novel ecological capabilities, which have allowed them colonize previously unexplored niches. Despite its importance to the understanding of the emergence of biological complexity, the evolution of symbiotic relationships remains hitherto a mystery in evolutionary biology. In this study, we contribute to the investigation of the evolutionary leaps enabled by mutualistic symbioses by sequencing the genome of Blattabacterium cuenoti, primary endosymbiont of the omnivorous cockroach Blatta orientalis, and one of the most ancient symbiotic associations. We perform comparative analyses between the Blattabacterium cuenoti genome and that of previously sequenced endosymbionts, namely those from the omnivorous hosts the Blattella germanica (Blattelidae) and Periplaneta americana (Blattidae), and the endosymbionts harbored by two wood-feeding hosts, the subsocial cockroach Cryptocercus punctulatus (Cryptocercidae) and the termite Mastotermes darwiniensis (Termitidae). Our study shows a remarkable evolutionary stasis of this symbiotic system throughout the evolutionary history of cockroaches and the deepest branching termite M. darwiniensis, in terms of not only chromosome architecture but also gene content, as revealed by the striking conservation of the Blattabacterium core genome. Importantly, the architecture of central metabolic network inferred from the endosymbiont genomes was established very early in Blattabacterium evolutionary history and could be an outcome of the essential role played by this endosymbiont in the host’s nitrogen economy.
Blattabacterium endosymbiont; Blatta orientalis; nitrogen metabolism; pan-genome; urease; genome reduction
The vomeronasal organ (VNO) is an olfactory structure that detects pheromones and environmental cues. It consists of sensory neurons that express evolutionary unrelated groups of transmembrane chemoreceptors. The predominant V1R and V2R receptor repertoires are believed to detect airborne and water-soluble molecules, respectively. It has been suggested that the shift in habitat of early tetrapods from water to land is reflected by an increase in the ratio of V1R/V2R genes. Snakes, which have a very large VNO associated with a sophisticated tongue delivery system, are missing from this analysis. Here, we use RNA-seq and RNA in situ hybridization to study the diversity, evolution, and expression pattern of the corn snake vomeronasal receptor repertoires. Our analyses indicate that snakes and lizards retain an extremely limited number of V1R genes but exhibit a large number of V2R genes, including multiple lineages of reptile-specific and snake-specific expansions. We finally show that the peculiar bigenic pattern of V2R vomeronasal receptor gene transcription observed in mammals is conserved in squamate reptiles, hinting at an important but unknown functional role played by this expression strategy. Our results do not support the hypothesis that the shift to a vomeronasal receptor repertoire dominated by V1Rs in mammals reflects the evolutionary transition of early tetrapods from water to land. This study sheds light on the evolutionary dynamics of the vomeronasal receptor families in vertebrates and reveals how mammals and squamates differentially adapted the same ancestral vomeronasal repertoire to succeed in a terrestrial environment.
vomeronasal organ (VNO); monogenic expression; evolution of sensorial abilities; squamates; snakes; phylogeny
Expansion and contraction of microRNA (miRNA) families can be studied in sequenced plant genomes through sequence alignments. Here, we focused on miR169 in sorghum because of its implications in drought tolerance and stem-sugar content. We were able to discover many miR169 copies that have escaped standard genome annotation methods. A new miR169 cluster was found on sorghum chromosome 1. This cluster is composed of the previously annotated sbi-MIR169o together with two newly found MIR169 copies, named sbi-MIR169t and sbi-MIR169u. We also found that a miR169 cluster on sorghum chr7 consisting of sbi-MIR169l, sbi-MIR169m, and sbi-MIR169n is contained within a chromosomal inversion of at least 500 kb that occurred in sorghum relative to Brachypodium, rice, foxtail millet, and maize. Surprisingly, synteny of chromosomal segments containing MIR169 copies with linked bHLH and CONSTANS-LIKE genes extended from Brachypodium to dictotyledonous species such as grapevine, soybean, and cassava, indicating a strong conservation of linkages of certain flowering and/or plant height genes and microRNAs, which may explain linkage drag of drought and flowering traits and would have consequences for breeding new varieties. Furthermore, alignment of rice and sorghum orthologous regions revealed the presence of two additional miR169 gene copies (miR169r and miR169s) on sorghum chr7 that formed an antisense miRNA gene pair. Both copies are expressed and target different set of genes. Synteny-based analysis of microRNAs among different plant species should lead to the discovery of new microRNAs in general and contribute to our understanding of their evolution.
comparative genomics; grasses; synteny; linkage drag; flowering; drought
Orphan genes are defined as genes that lack detectable similarity to genes in other species and therefore no clear signals of common descent (i.e., homology) can be inferred. Orphans are an enigmatic portion of the genome because their origin and function are mostly unknown and they typically make up 10% to 30% of all genes in a genome. Several case studies demonstrated that orphans can contribute to lineage-specific adaptation. Here, we study orphan genes by comparing 30 arthropod genomes, focusing in particular on seven recently sequenced ant genomes. This setup allows analyzing a major metazoan taxon and a comparison between social Hymenoptera (ants and bees) and nonsocial Diptera (flies and mosquitoes). First, we find that recently split lineages undergo accelerated genomic reorganization, including the rapid gain of many orphan genes. Second, between the two insect orders Hymenoptera and Diptera, orphan genes are more abundant and emerge more rapidly in Hymenoptera, in particular, in leaf-cutter ants. With respect to intragenomic localization, we find that ant orphan genes show little clustering, which suggests that orphan genes in ants are scattered uniformly over the genome and between nonorphan genes. Finally, our results indicate that the genetic mechanisms creating orphan genes—such as gene duplication, frame-shift fixation, creation of overlapping genes, horizontal gene transfer, and exaptation of transposable elements—act at different rates in insects, primates, and plants. In Formicidae, the majority of orphan genes has their origin in intergenic regions, pointing to a high rate of de novo gene formation or generalized gene loss, and support a recently proposed dynamic model of frequent gene birth and death.
orphan genes; genome evolution; insects; ants (Formicidae)
It has been long known that insect-infecting trypanosomatid flagellates from the genera Angomonas and Strigomonas harbor bacterial endosymbionts (Candidatus Kinetoplastibacterium or TPE [trypanosomatid proteobacterial endosymbiont]) that supplement the host metabolism. Based on previous analyses of other bacterial endosymbiont genomes from other lineages, a stereotypical path of genome evolution in such bacteria over the duration of their association with the eukaryotic host has been characterized. In this work, we sequence and analyze the genomes of five TPEs, perform their metabolic reconstruction, do an extensive phylogenomic analyses with all available Betaproteobacteria, and compare the TPEs with their nearest betaproteobacterial relatives. We also identify a number of housekeeping and central metabolism genes that seem to have undergone positive selection. Our genome structure analyses show total synteny among the five TPEs despite millions of years of divergence, and that this lineage follows the common path of genome evolution observed in other endosymbionts of diverse ancestries. As previously suggested by cell biology and biochemistry experiments, Ca. Kinetoplastibacterium spp. preferentially maintain those genes necessary for the biosynthesis of compounds needed by their hosts. We have also shown that metabolic and informational genes related to the cooperation with the host are overrepresented amongst genes shown to be under positive selection. Finally, our phylogenomic analysis shows that, while being in the Alcaligenaceae family of Betaproteobacteria, the closest relatives of these endosymbionts are not in the genus Bordetella as previously reported, but more likely in the Taylorella genus.
endosymbiont biology; phylogenomics; comparative genomics; Trypanosomatidae; selective pressure
Recent studies suggested a role for the human endogenous retrovirus (HERV) group HERV-K(HML-2) in melanoma because of upregulated transcription and expression of HERV-K(HML-2)-encoded proteins. Very little is known about which HML-2 loci are transcribed in melanoma. We assigned >1,400 HML-2 cDNA sequences generated from various melanoma and related samples to genomic HML-2 loci, identifying a total of 23 loci as transcribed. Transcription profiles of loci differed significantly between samples. One locus was found transcribed only in melanoma-derived samples but not in melanocytes and might represent a marker for melanoma. Several of the transcribed loci harbor ORFs for retroviral Gag and/or Env proteins. Env-encoding loci were transcribed only in melanoma. Specific investigation of rec and np9 transcripts indicated transcription of protein encoding loci in melanoma and melanocytes hinting at the relevance of Rec and Np9 in melanoma. UVB irradiation changed transcription profiles of loci and overall transcript levels decreased in melanoma and melanocytes. We further identified transcribed HML-2 loci formed by reverse transcription of spliced HML-2 transcripts by L1 machinery or in a retroviral fashion, with loci potentially encoding HML-2-like proteins. We reveal complex, sample-specific transcription of HML-2 loci in melanoma and related samples. Identified HML-2 loci and proteins encoded by those loci are particularly relevant for further studying the role of HML-2 in melanoma. Transcription of HERVs appears as a complex mechanism requiring specific studies to elucidate which HERV loci are transcribed and how transcribed HERVs may be involved in disease.
repetitive DNA; HERV; provirus; transcription; retrotransposition; neoplasms
In evolution of mammals, some of essential genes for placental development are known to be of retroviral origin, as syncytin-1 derived from an envelope (env) gene of an endogenous retrovirus (ERV) aids in the cell fusion of placenta in humans. Although the placenta serves the same function in all placental mammals, env-derived genes responsible for trophoblast cell fusion and maternal immune tolerance differ among species and remain largely unidentified in the bovine species. To examine env-derived genes playing a role in the bovine placental development comprehensively, we determined the transcriptomic profiles of bovine conceptuses during three crucial windows of implantation periods using a high-throughput sequencer. The sequence reads were mapped into the bovine genome, in which ERV candidates were annotated using RetroTector© (7,624 and 1,542 for ERV-derived and env-derived genes, respectively). The mapped reads showed that approximately 18% (284 genes) of env-derived genes in the genome were expressed during placenta formation, and approximately 4% (63 genes) were detected for all days examined. We verified three env-derived genes that are expressed in trophoblast cells by polymerase chain reaction. Out of these three, the sequence of env-derived gene with the longest open reading frame (named BERV-P env) was found to show high expression levels in trophoblast cell lines and to be similar to those of syncytin-Car1 genes found in dogs and cats, despite their disparate origins. These results suggest that placentation depends on various retrovirus-derived genes that could have replaced endogenous predecessors during evolution.
endogenous retrovirus; RNA-seq; syncytin; envelope; cow
Hybridization and abiotic stress are natural agents hypothesized to influence activation and proliferation of transposable elements in wild populations. In this report, we examine the effects of these agents on expression dynamics of both quiescent and transcriptionally active sublineages of long terminal repeat (LTR) retrotransposons in wild sunflower species with a notable history of transposable element proliferation. For annual sunflower species Helianthus annuus and H. petiolaris, neither early generation hybridization nor abiotic stress, alone or in combination, induced transcriptional activation of quiescent sublineages of LTR retrotransposons. These treatments also failed to further induce expression of sublineages that are transcriptionally active; instead, expression of active sublineages in F1 and backcross hybrids was nondistinguishable from, or intermediate relative to, parental lines, and abiotic stress generally decreased normalized expression relative to controls. In contrast to findings for early generation hybridization between H. annuus and H. petiolaris, ancient sunflower hybrid species derived from these same two species and which have undergone massive proliferation events of LTR retrotransposons display 2× to 6× higher expression levels of transcriptionally active sublineages relative to parental sunflower species H. annuus and H. petiolaris. Implications and possible explanations for these findings are discussed.
LTR retrotransposon; TE activation; hybridization; abiotic stress; gypsy; copia
The most bacteria-like mitochondrial genome known is that of the jakobid flagellate Reclinomonas americana NZ. This genome also encodes the largest known gene set among mitochondrial DNAs (mtDNAs), including the RNA subunit of RNase P (transfer RNA processing), a reduced form of transfer–messenger RNA (translational control), and a four-subunit bacteria-like RNA polymerase, which in other eukaryotes is substituted by a nucleus-encoded, single-subunit, phage-like enzyme. Further, protein-coding genes are preceded by potential Shine–Dalgarno translation initiation motifs. Whether similarly ancestral mitochondrial characters also exist in relatives of R. americana NZ is unknown. Here, we report a comparative analysis of nine mtDNAs from five distant jakobid genera: Andalucia, Histiona, Jakoba, Reclinomonas, and Seculamonas. We find that Andalucia godoyi has an even larger mtDNA gene complement than R. americana NZ. The extra genes are rpl35 (a large subunit mitoribosomal protein) and cox15 (involved in cytochrome oxidase assembly), which are nucleus encoded throughout other eukaryotes. Andalucia cox15 is strikingly similar to its homolog in the free-living α-proteobacterium Tistrella mobilis. Similarly, a long, highly conserved gene cluster in jakobid mtDNAs, which is a clear vestige of prokaryotic operons, displays a gene order more closely resembling that in free-living α-proteobacteria than in Rickettsiales species. Although jakobid mtDNAs, overall, are characterized by bacteria-like features, they also display a few remarkably divergent characters, such as 3′-tRNA editing in Seculamonas ecuadoriensis and genome linearization in Jakoba libera. Phylogenetic analysis with mtDNA-encoded proteins strongly supports monophyly of jakobids with Andalucia as the deepest divergence. However, it remains unclear which α-proteobacterial group is the closest mitochondrial relative.
complete mtDNA sequences; genome evolution; gene migration to nucleus; excavates
The cAMP receptor protein (CRP)/fumarate and nitrate reduction regulatory protein (FNR)-type transcription factors (TFs) are members of a well-characterized global TF family in bacteria and have two conserved domains: the N-terminal ligand-binding domain for small molecules (e.g., cAMP, NO, or O2) and the C-terminal DNA-binding domain. Although the CRP/FNR-type TFs recognize very similar consensus DNA target sequences, they can regulate different sets of genes in response to environmental signals. To clarify the evolution of the CRP/FNR-type TFs throughout the bacterial kingdom, we undertook a comprehensive computational analysis of a large number of annotated CRP/FNR-type TFs and the corresponding bacterial genomes. Based on the amino acid sequence similarities among 1,455 annotated CRP/FNR-type TFs, spectral clustering classified the TFs into 12 representative groups, and stepwise clustering allowed us to propose a possible process of protein evolution. Although each cluster mainly consists of functionally distinct members (e.g., CRP, NTC, FNR-like protein, and FixK), FNR-related TFs are found in several groups and are distributed in a wide range of bacterial phyla in the sequence similarity network. This result suggests that the CRP/FNR-type TFs originated from an ancestral FNR protein, involved in nitrogen fixation. Furthermore, a phylogenetic profiling analysis showed that combinations of TFs and their target genes have fluctuated dynamically during bacterial evolution. A genome-wide analysis of TF-binding sites also suggested that the diversity of the transcriptional regulatory system was derived by the stepwise adaptation of TF-binding sites to the evolution of TFs.
molecular evolution; phylogenetics; spectral clustering; transcription factor; cis-element
Microsatellites (SSRs) are highly susceptible to expansions and contractions. When located in a coding sequence, the insertion or the deletion of a single unit for a mono-, di-, tetra-, or penta(nucleotide)-SSR creates a frameshift. As a consequence, one would expect to find only very few of these SSRs in coding sequences because of their strong deleterious potential. Unexpectedly, genomes contain many coding SSRs of all types. Here, we report on a study of their evolution in a phylogenetic context using the genomes of four primates: human, chimpanzee, orangutan, and macaque. In a set of 5,015 orthologous genes unambiguously aligned among the four species, we show that, except for tri- and hexa-SSRs, for which insertions and deletions are frequently observed, SSRs in coding regions evolve mainly by substitutions. We show that the rate of substitution in all types of coding SSRs is typically two times higher than in the rest of coding sequences. Additionally, we observe that although numerous coding SSRs are created and lost by substitutions in the lineages, their numbers remain constant. This last observation suggests that the coding SSRs have reached equilibrium. We hypothesize that this equilibrium involves a combination of mutation, drift, and selection. We thus estimated the fitness cost of mono-SSRs and show that it increases with the number of units. We finally show that the cost of coding mono-SSRs greatly varies from function to function, suggesting that the strength of the selection that acts against them can be correlated to gene functions.
SSR; microsatellites; phylogeny; primate genomes
Molecular phylogenetic studies have not yet reached a consensus on the placement of Ginkgoales, which is represented by the only living species, Ginkgo biloba (common name: ginkgo). At least six discrepant placements of ginkgo have been proposed. This study aimed to use the chloroplast phylogenomic approach to examine possible factors that lead to such disagreeing placements. We found the sequence types used in the analyses as the most critical factor in the conflicting placements of ginkgo. In addition, the placement of ginkgo varied in the trees inferred from nucleotide (NU) sequences, which notably depended on breadth of taxon sampling, tree-building methods, codon positions, positions of Gnetopsida (common name: gnetophytes), and including or excluding gnetophytes in data sets. In contrast, the trees inferred from amino acid (AA) sequences congruently supported the monophyly of a ginkgo and Cycadales (common name: cycads) clade, regardless of which factors were examined. Our site-stripping analysis further revealed that the high substitution saturation of NU sequences mainly derived from the third codon positions and contributed to the variable placements of ginkgo. In summary, the factors we surveyed did not affect results inferred from analyses of AA sequences. Congruent topologies in our AA trees give more confidence in supporting the ginkgo–cycad sister-group hypothesis.
phylogenomics; cycads; chloroplast; seed plants; ginkgo
The Drosophila Y chromosome is a degenerated, heterochromatic chromosome with few functional genes. Despite this, natural variation on the Y chromosome in D. melanogaster has substantial trans-acting effects on the regulation of X-linked and autosomal genes. It is not clear, however, whether these genes simply represent a random subset of the genome or whether specific functional properties are associated with susceptibility to regulation by Y-linked variation. Here, we present a meta-analysis of four previously published microarray studies of Y-linked regulatory variation (YRV) in D. melanogaster. We show that YRV genes are far from a random subset of the genome: They are more likely to be in repressive chromatin contexts, be expressed tissue specifically, and vary in expression within and between species than non-YRV genes. Furthermore, YRV genes are more likely to be associated with the nuclear lamina than non-YRV genes and are generally more likely to be close to each other in the nucleus (although not along chromosomes). Taken together, these results suggest that variation on the Y chromosome plays a role in modifying how the genome is distributed across chromatin compartments, either via changes in the distribution of DNA-binding proteins or via changes in the spatial arrangement of the genome in the nucleus.
gene expression; heterochromatin; evolution
Evolution of prokaryotes involves extensive loss and gain of genes, which lead to substantial differences in the gene repertoires even among closely related organisms. Through a wide range of phylogenetic depths, gene frequency distributions in prokaryotic pangenomes bear a characteristic, asymmetrical U-shape, with a core of (nearly) universal genes, a “shell” of moderately common genes, and a “cloud” of rare genes. We employ mathematical modeling to investigate evolutionary processes that might underlie this universal pattern. Gene frequency distributions for almost 400 groups of 10 bacterial or archaeal species each over a broad range of evolutionary distances were fit to steady-state, infinite allele models based on the distribution of gene replacement rates and the phylogenetic tree relating the species in each group. The fits of the theoretical frequency distributions to the empirical ones yield model parameters and estimates of the goodness of fit. Using the Akaike Information Criterion, we show that the neutral model of genome evolution, with the same replacement rate for all genes, can be confidently rejected. Of the three tested models with purifying selection, the one in which the distribution of replacement rates is derived from a stochastic population model with additive per-gene fitness yields the best fits to the data. The selection strength estimated from the fits declines with evolutionary divergence while staying well outside the neutral regime. These findings indicate that, unlike some other universal distributions of genomic variables, for example, the distribution of paralogous gene family membership, the gene frequency distribution is substantially affected by selection.
gene frequency distribution; steady genome model; goodness of fit; evolution mechanisms
About 1 million people in the world die each year from diseases spread by mosquitoes, and understanding the mechanism of host identification by the mosquitoes through olfaction is at stake. The role of odorant binding proteins (OBPs) in the primary molecular events of olfaction in mosquitoes is becoming an important focus of biological research in this area. Here, we present a comprehensive comparative genomics study of OBPs in the three disease-transmitting mosquito species Anopheles gambiae, Aedes aegypti, and Culex quinquefasciatus starting with the identification of 110 new OBPs in these three genomes. We have characterized their genomic distribution and orthologous and phylogenetic relationships. The diversity and expansion observed with respect to the Aedes and Culex genomes suggests that the OBP gene family acquired functional diversity concurrently with functional constraints posed on these two species. Sequences with unique features have been characterized such as the “two-domain OBPs” (previously known as Atypical OBPs) and “MinusC OBPs” in mosquito genomes. The extensive comparative genomics featured in this work hence provides useful primary insights into the role of OBPs in the molecular adaptations of mosquito olfactory system and could provide more clues for the identification of potential targets for insect repellants and attractants.
odorant binding proteins; OBP; mosquito; Culex quinquefasciatus; Aedes aegypti; Anopheles gambiae; olfaction; phylogeny
Microsatellites, or simple sequence repeats (SSRs), are common and widespread DNA elements in genomes of many organisms. However, their dynamics in genome evolution is unclear, whereby they are thought to evolve neutrally. More available genome sequences along with dated phylogenies allowed for studying the evolution of these repetitive DNA elements along evolutionary time scales. This could be used to compare rates of genome evolution. We show that SSRs in insects can be retained for several hundred million years. Different types of microsatellites seem to be retained longer than others. By comparing Dipteran with Hymenopteran species, we found very similar patterns of SSR loss during their evolution, but both taxa differ profoundly in the rate. Relative to divergence time, Diptera lost SSRs twice as fast as Hymenoptera. The loss of SSRs on the Drosophila melanogaster X-chromosome was higher than on the other chromosomes. However, accounting for generation time, the Diptera show an 8.5-fold slower rate of SSR loss than the Hymenoptera, which, in contrast to previous studies, suggests a faster genome evolution in the latter. This shows that generation time differences can have a profound effect. A faster genome evolution in these insects could be facilitated by several factors very different to Diptera, which is discussed in light of our results on the haplodiploid D. melanogaster X-chromosome. Furthermore, large numbers of SSRs can be found to be in synteny and thus could be exploited as a tool to investigate genome structure and evolution.
microsatellite conservation; genome evolution; social Hymenoptera; Drosophila; mosquitoes; generation time; haplodiploidy; synteny
Rhodopsin-containing marine microbes such as those in the class Flavobacteriia play a pivotal role in the biogeochemical cycle of the euphotic zone (Fuhrman JA, Schwalbach MS, Stingl U. 2008. Proteorhodopsins: an array of physiological roles? Nat Rev Microbiol. 6:488–494). Deciphering the genome information of flavobacteria and accessing the diversity and ecological impact of microbial rhodopsins are important in understanding and preserving the global ecosystems. The genome sequence of the orange-pigmented marine flavobacterium Nonlabens dokdonensis (basonym: Donghaeana dokdonensis) DSW-6 was determined. As a marine photoheterotroph, DSW-6 has written in its genome physiological features that allow survival in the oligotrophic environments. The sequence analysis also uncovered a gene encoding an unexpected type of microbial rhodopsin containing a unique motif in addition to a proteorhodopsin gene and a number of photolyase or cryptochrome genes. Homologs of the novel rhodopsin gene were found in other flavobacteria, alphaproteobacteria, a species of Cytophagia, a deinococcus, and even a eukaryote diatom. They all contain the characteristic NQ motif and form a phylogenetically distinct group. Expression analysis of this rhodopsin gene in DSW-6 indicated that it is induced at high NaCl concentrations, as well as in the presence of light and the absence of nutrients. Genomic and metagenomic surveys demonstrate the diversity of the NQ rhodopsins in nature and the prevalent occurrence of the encoding genes among microbial communities inhabiting hypersaline niches, suggesting its involvement in sodium metabolism and the sodium-adapted lifestyle.
heterotrophic picoplankton; Bacteroidetes; bacteriorhodopsin; xanthorhodopsin; sodium pump; metagenome
The impact of transposable elements (TEs) on genome structure, plasticity, and evolution is still not well understood. The recent availability of complete genome sequences makes it possible to get new insights on the evolutionary dynamics of TEs from the phylogenetic analysis of their multiple copies in a wide range of species. However, this source of information is not always fully exploited. Here, we show how the history of transposition activity may be qualitatively and quantitatively reconstructed by considering the distribution of transposition events in the phylogenetic tree, along with the tree topology. Using statistical models developed to infer speciation and extinction rates in species phylogenies, we demonstrate that it is possible to estimate the past transposition rate of a TE family, as well as how this rate varies with time. This methodological framework may not only facilitate the interpretation of genomic data, but also serve as a basis to develop new theoretical and statistical models.
transposition activity; phylogeny; branching process; repeated sequences
We examine the advantages of going beyond sequence similarity and use both protein three-dimensional (3D) structure prediction and then quaternary structure (docking) of inferred 3D structures to help evaluate whether comparable sequences can fold into homologous structures with sufficient lateral associations for quaternary structure formation. Our test case is the major vault protein (MVP) that oligomerizes in multiple copies to form barrel-like vault particles and is relatively widespread among eukaryotes. We used the iterative threading assembly refinement server (I-TASSER) to predict whether putative MVP sequences identified by BLASTp and PSI Basic Local Alignment Search Tool are structurally similar to the experimentally determined rodent MVP tertiary structures. Then two identical predicted quaternary structures from I-TASSER are analyzed by RosettaDock to test whether a pair-wise association occurs, and hence whether the oligomeric vault complex is likely to form for a given MVP sequence. Positive controls for the method are the experimentally determined rat (Rattus norvegicus) vault X-ray crystal structure and the purple sea urchin (Strongylocentrotus purpuratus) MVP sequence that forms experimentally observed vaults. These and two kinetoplast MVP structural homologs were predicted with high confidence value, and RosettaDock predicted that these MVP sequences would dock laterally and therefore could form oligomeric vaults. As the negative control, I-TASSER did not predict an MVP-like structure from a randomized rat MVP sequence, even when constrained to the rat MVP crystal structure (PDB:2ZUO), thus further validating the method. The protocol identified six putative homologous MVP sequences in the heterobolosean Naegleria gruberi within the excavate kingdom. Two of these sequences are predicted to be structurally similar to rat MVP, despite being in excess of 300 residues shorter. The method can be used generally to help test predictions of homology via structural analysis.
homology modeling; BLAST; I-TASSER; RosettaDock; Naegleria gruberi
Robustness is considered a ubiquitous property of living systems at all levels of organization, and small noncoding RNA (sncRNA) is a genuine model for its study at the molecular level. In this communication, we question whether microRNA precursors (pre-miRNAs) are actually structurally robust, as previously suggested. We found that natural pre-miRNAs are not more robust than expected under an appropriate null model. On the contrary, we found that eukaryotic pre-miRNAs show a significant enrichment in conformational flexibility at the thermal equilibrium of the molecule, that is, in their plasticity. Our results further support the selection for functional diversification and evolvability in sncRNAs.
conformational flexibility; evolvability; noncoding RNA; secondary structure; thermodynamics