Candida glabrata follows C. albicans as the second or third most prevalent cause of candidemia worldwide. These two pathogenic yeasts are distantly related, C. glabrata being part of the Nakaseomyces, a group more closely related to Saccharomyces cerevisiae. Although C. glabrata was thought to be the only pathogenic Nakaseomyces, two new pathogens have recently been described within this group: C. nivariensis and C. bracarensis. To gain insight into the genomic changes underlying the emergence of virulence, we sequenced the genomes of these two, and three other non-pathogenic Nakaseomyces, and compared them to other sequenced yeasts.
Our results indicate that the two new pathogens are more closely related to the non-pathogenic N. delphensis than to C. glabrata. We uncover duplications and accelerated evolution that specifically affected genes in the lineage preceding the group containing N. delphensis and the three pathogens, which may provide clues to the higher propensity of this group to infect humans. Finally, the number of Epa-like adhesins is specifically enriched in the pathogens, particularly in C. glabrata.
Remarkably, some features thought to be the result of adaptation of C. glabrata to a pathogenic lifestyle, are present throughout the Nakaseomyces, indicating these are rather ancient adaptations to other environments. Phylogeny suggests that human pathogenesis evolved several times, independently within the clade. The expansion of the EPA gene family in pathogens establishes an evolutionary link between adhesion and virulence phenotypes. Our analyses thus shed light onto the relationships between virulence and the recent genomic changes that occurred within the Nakaseomyces.
Sequence Accession Numbers
Nakaseomyces delphensis: CAPT01000001 to CAPT01000179
Candida bracarensis: CAPU01000001 to CAPU01000251
Candida nivariensis: CAPV01000001 to CAPV01000123
Candida castellii: CAPW01000001 to CAPW01000101
Nakaseomyces bacillisporus: CAPX01000001 to CAPX01000186
Candida glabrata; Fungal pathogens; Nakaseomyces; Yeast genomes; Yeast evolution
Carbohydrate-active enzymes (CAZymes) are involved in the metabolism of glycoconjugates, oligosaccharides, and polysaccharides and, in the case of plant pathogens, in the degradation of the host cell wall and storage compounds. We performed an in silico analysis of CAZymes predicted from the genomes of seven Pythium species (Py. aphanidermatum, Py. arrhenomanes, Py. irregulare, Py. iwayamai, Py. ultimum var. ultimum, Py. ultimum var. sporangiiferum and Py. vexans) using the “CAZymes Analysis Toolkit” and “Database for Automated Carbohydrate-active Enzyme Annotation” and compared them to previously published oomycete genomes. Growth of Pythium spp. was assessed in a minimal medium containing selected carbon sources that are usually present in plants. The in silico analyses, coupled with our in vitro growth assays, suggest that most of the predicted CAZymes are involved in the metabolism of the oomycete cell wall with starch and sucrose serving as the main carbohydrate sources for growth of these plant pathogens. The genomes of Pythium spp. also encode pectinases and cellulases that facilitate degradation of the plant cell wall and are important in hyphal penetration; however, the species examined in this study lack the requisite genes for the complete saccharification of these carbohydrates for use as a carbon source. Genes encoding for xylan, xyloglucan, (galacto)(gluco)mannan and cutin degradation were absent or infrequent in Pythium spp.. Comparative analyses of predicted CAZymes in oomycetes indicated distinct evolutionary histories. Furthermore, CAZyme gene families among Pythium spp. were not uniformly distributed in the genomes, suggesting independent gene loss events, reflective of the polyphyletic relationships among some of the species.
Integrative genomics predictors, which score highly in predicting bacterial essential genes, would be unfeasible in most species because the data sources are limited. We developed a universal approach and tool designated Geptop, based on orthology and phylogeny, to offer gene essentiality annotations. In a series of tests, our Geptop method yielded higher area under curve (AUC) scores in the receiver operating curves than the integrative approaches. In the ten-fold cross-validations among randomly upset samples, Geptop yielded an AUC of 0.918, and in the cross-organism predictions for 19 organisms Geptop yielded AUC scores between 0.569 and 0.959. A test applied to the very recently determined essential gene dataset from the Porphyromonas gingivalis, which belongs to a phylum different with all of the above 19 bacterial genomes, gave an AUC of 0.77. Therefore, Geptop can be applied to any bacterial species whose genome has been sequenced. Compared with the essential genes uniquely identified by the lethal screening, the essential genes predicted only by Gepop are associated with more protein-protein interactions, especially in the three bacteria with lower AUC scores (<0.7). This may further illustrate the reliability and feasibility of our method in some sense. The web server and standalone version of Geptop are available at http://cefg.uestc.edu.cn/geptop/ free of charge. The tool has been run on 968 bacterial genomes and the results are accessible at the website.
The exponential growth of the biomedical literature is making the need for efficient, accurate text-mining tools increasingly clear. The identification of named biological entities in text is a central and difficult task. We have developed an efficient algorithm and implementation of a dictionary-based approach to named entity recognition, which we here use to identify names of species and other taxa in text. The tool, SPECIES, is more than an order of magnitude faster and as accurate as existing tools. The precision and recall was assessed both on an existing gold-standard corpus and on a new corpus of 800 abstracts, which were manually annotated after the development of the tool. The corpus comprises abstracts from journals selected to represent many taxonomic groups, which gives insights into which types of organism names are hard to detect and which are easy. Finally, we have tagged organism names in the entire Medline database and developed a web resource, ORGANISMS, that makes the results accessible to the broad community of biologists. The SPECIES software is open source and can be downloaded from http://species.jensenlab.org along with dictionary files and the manually annotated gold-standard corpus. The ORGANISMS web resource can be found at http://organisms.jensenlab.org.
Two fosmid libraries, totaling 13,200 clones, were obtained from bioreactor sludge of petroleum refinery wastewater treatment system. The library screening based on PCR and biological activity assays revealed more than 400 positive clones for phenol degradation. From these, 100 clones were randomly selected for pyrosequencing in order to evaluate the genetic potential of the microorganisms present in wastewater treatment plant for biodegradation, focusing mainly on novel genes and pathways of phenol and aromatic compound degradation. The sequence analysis of selected clones yielded 129,635 reads at an estimated 17-fold coverage. The phylogenetic analysis showed Burkholderiales and Rhodocyclales as the most abundant orders among the selected fosmid clones. The MG-RAST analysis revealed a broad metabolic profile with important functions for wastewater treatment, including metabolism of aromatic compounds, nitrogen, sulphur and phosphorus. The predicted 2,276 proteins included phenol hydroxylases and cathecol 2,3- dioxygenases, involved in the catabolism of aromatic compounds, such as phenol, byphenol, benzoate and phenylpropanoid. The sequencing of one fosmid insert of 33 kb unraveled the gene that permitted the host, Escherichia coli EPI300, to grow in the presence of aromatic compounds. Additionally, the comparison of the whole fosmid sequence against bacterial genomes deposited in GenBank showed that about 90% of sequence showed no identity to known sequences of Proteobacteria deposited in the NCBI database. This study surveyed the functional potential of fosmid clones for aromatic compound degradation and contributed to our knowledge of the biodegradative capacity and pathways of microbial assemblages present in refinery wastewater treatment system.
Dendrobium spp. are traditional Chinese medicinal plants, and the main effective ingredients (polysaccharides and alkaloids) have pharmacologic effects on gastritis infection, cancer, and anti-aging. Previously, we confirmed endophytic xylariaceous fungi as the dominant fungi in several Dendrobium species of tropical regions from China. In the present study, the diversity, taxonomy, and distribution of culturable endophytic xylariaceous fungi associated with seven medicinal species of Dendrobium (Orchidaceae) were investigated. Among the 961 endophytes newly isolated, 217 xylariaceous fungi (morphotaxa) were identified using morphological and molecular methods. The phylogenetic tree constructed using nuclear ribosomal internal transcribed spacer (ITS), large subunit of ribosomal DNA (LSU), and beta-tubulin sequences divided these anamorphic xylariaceous isolates into at least 18 operational taxonomic units (OTUs). The diversity of the endophytic xylariaceous fungi in these seven Dendrobium species was estimated using Shannon and evenness indices, with the results indicating that the dominant Xylariaceae taxa in each Dendrobium species were greatly different, though common xylariaceous fungi were found in several Dendrobium species. These findings implied that different host plants in the same habitats exhibit a preference and selectivity for their fungal partners. Using culture-dependent approaches, these xylariaceous isolates may be important sources for the future screening of new natural products and drug discovery.
Hierarchical orthologous groups are defined as sets of genes that have descended from a single common ancestor within a taxonomic range of interest. Identifying such groups is useful in a wide range of contexts, including inference of gene function, study of gene evolution dynamics and comparative genomics. Hierarchical orthologous groups can be derived from reconciled gene/species trees but, this being a computationally costly procedure, many phylogenomic databases work on the basis of pairwise gene comparisons instead (“graph-based” approach). To our knowledge, there is only one published algorithm for graph-based hierarchical group inference, but both its theoretical justification and performance in practice are as of yet largely uncharacterised. We establish a formal correspondence between the orthology graph and hierarchical orthologous groups. Based on that, we devise GETHOGs (“Graph-based Efficient Technique for Hierarchical Orthologous Groups”), a novel algorithm to infer hierarchical groups directly from the orthology graph, thus without needing gene tree inference nor gene/species tree reconciliation. GETHOGs is shown to correctly reconstruct hierarchical orthologous groups when applied to perfect input, and several extensions with stringency parameters are provided to deal with imperfect input data. We demonstrate its competitiveness using both simulated and empirical data. GETHOGs is implemented as a part of the freely-available OMA standalone package (http://omabrowser.org/standalone). Furthermore, hierarchical groups inferred by GETHOGs (“OMA HOGs”) on >1,000 genomes can be interactively queried via the OMA browser (http://omabrowser.org).
A 6-chloronicotinic acid mineralizing bacterium was isolated from enrichment cultures originating from imidacloprid-contaminated soil samples. This Bradyrhizobiaceae, designated strain SG-6C, hydrolytically dechlorinated 6-chloronicotinic acid to 6-hydroxynicotinic acid, which was then further metabolised via the nicotinic acid pathway. This metabolic pathway was confirmed by growth and resting cell assays using HPLC and LC-MS studies. A candidate for the gene encoding the initial dechlorination step, named cch2 (for 6-chloronicotinic acid chlorohydrolase), was identified using genome sequencing and its function was confirmed using resting cell assays on E. coli heterologously expressing this gene. The 464 amino acid enzyme was found to be a member of the metal dependent hydrolase superfamily with similarities to the TRZ/ATZ family of chlorohydrolases. We also provide evidence that cch2 was mobilized into this bacterium by an Integrative and Conjugative Element (ICE) that feeds 6-hydroxynicotinic acid into the existing nicotinic acid mineralization pathway.
Genes encoding proteins involved in sperm-egg interaction and fertilization exhibit a particularly fast evolution and may participate in prezygotic species isolation , . Some of them (ZP3, ADAM1, ADAM2, ACR and CD9) have individually been shown to evolve under positive selection , , suggesting a role of positive Darwinian selection on sperm-egg interaction. However, the genes involved in this biological function have not been systematically and exhaustively studied with an evolutionary perspective, in particular across vertebrates with internal and external fertilization. Here we show that 33 genes among the 69 that have been experimentally shown to be involved in fertilization in at least one taxon in vertebrates are under positive selection. Moreover, we identified 17 pseudogenes and 39 genes that have at least one duplicate in one species. For 15 genes, we found neither positive selection, nor gene copies or pseudogenes. Genes of teleosts, especially genes involved in sperm-oolemma fusion, appear to be more frequently under positive selection than genes of birds and eutherians. In contrast, pseudogenization, gene loss and gene gain are more frequent in eutherians. Thus, each of the 19 studied vertebrate species exhibits a unique signature characterized by gene gain and loss, as well as position of amino acids under positive selection. Reflecting these clade-specific signatures, teleosts and eutherian mammals are recovered as clades in a parsimony analysis. Interestingly the same analysis places Xenopus apart from teleosts, with which it shares the primitive external fertilization, and locates it along with amniotes (which share internal fertilization), suggesting that external or internal environmental conditions of germ cell interaction may not be the unique factors that drive the evolution of fertilization genes. Our work should improve our understanding of the fertilization process and on the establishment of reproductive barriers, for example by offering new leads for experiments on genes identified as positively selected.
Ling-zhi, a widely cultivated fungus in China, has a long history in traditional Chinese medicine. Although the name ‘Ganoderma lucidum’, a species originally described from England, has been applied to the fungus, their identities are not the same. This study aims to clarify the identity of this medicinally and economically important fungus. Specimens of Ling-zhi from China (field collections and cultivated basidiomata of the Chinese ‘G. lucidum’), G. lucidum from UK and other related Ganoderma species, were examined both morphologically and molecularly. High variability of basidioma morphology was found in the cultivated specimens of the Chinese ‘G. lucidum’, while some microscopic characters were more or less consistent, i.e. short clavate cutis elements, Bovista-type ligative hyphae and strongly echinulate basidiospores. These characters were also found in the holotype of G. sichuanense, a species originally described from Sichuan, China, and in recent collections made in the type locality of the species, which matched the diagnostic characters in the prologue. For comparison, specimens of closely related species, G. lucidum, G. multipileum, G. resinaceum, G. tropicum and G. weberianum, were also examined. DNA sequences were obtained from field collections, cultivated basidiomata and living strains of the Chinese ‘G. lucidum’, specimens from the type locality of G. sichuanense, and specimens of the closely related species studied. Three-gene combined analyses (ITS+IGS+rpb2) were performed and the results indicated that the Chinese ‘G. lucidum’ shared almost identical sequences with G. sichuanense. Based on both morphological and molecular data, the identity of the Chinese ‘G. lucidum’ (Ling-zhi) is considered conspecific with G. sichuanense. Detailed morphological descriptions and illustrations are provided in addition to discussion of nomenclature implications.
The kingdom Fungi is estimated to include 1.5 million or more species, playing key roles as decomposers, mutualists, and parasites in every biome on the earth. To comprehensively understand the diversity and ecology of this huge kingdom, DNA barcoding targeting the internal transcribed spacer (ITS) region of the nuclear ribosomal repeat has been regarded as a prerequisite procedure. By extensively surveying ITS sequences in public databases, we designed new ITS primers with improved coverage across diverse taxonomic groups of fungi compared to existing primers. An in silico analysis based on public sequence databases indicated that the newly designed primers matched 99% of ascomycete and basidiomycete ITS taxa (species, subspecies or varieties), causing little taxonomic bias toward either fungal group. Two of the newly designed primers could inhibit the amplification of plant sequences and would enable the selective investigation of fungal communities in mycorrhizal associations, soil, and other types of environmental samples. Optimal PCR conditions for the primers were explored in an in vitro investigation. The new primers developed in this study will provide a basis for ecological studies on the diversity and community structures of fungi in the era of massive DNA sequencing.
Multi-functional enzymes are enzymes that perform multiple physiological functions. Characterization and identification of multi-functional enzymes are critical for communication and cooperation between different functions and pathways within a complex cellular system or between cells. In present study, we collected literature-reported 6,799 multi-functional enzymes and systematically characterized them in structural, functional, and evolutionary aspects. It was found that four physiochemical properties, that is, charge, polarizability, hydrophobicity, and solvent accessibility, are important for characterization of multi-functional enzymes. Accordingly, a combinational model of support vector machine and random forest model was constructed, based on which 6,956 potential novel multi-functional enzymes were successfully identified from the ENZYME database. Moreover, it was observed that multi-functional enzymes are non-evenly distributed in species, and that Bacteria have relatively more multi-functional enzymes than Archaebacteria and Eukaryota. Comparative analysis indicated that the multi-functional enzymes experienced a fluctuation of gene gain and loss during the evolution from S. cerevisiae to H. sapiens. Further pathway analyses indicated that a majority of multi-functional enzymes were well preserved in catalyzing several essential cellular processes, for example, metabolisms of carbohydrates, nucleotides, and amino acids. What’s more, a database of known multi-functional enzymes and a server for novel multi-functional enzyme prediction were also constructed for free access at http://bioinf.xmu.edu.cn/databases/MFEs/index.htm.
Archaeal and bacterial ribosomes contain more than 50 proteins, including 34 that are universally conserved in the three domains of cellular life (bacteria, archaea, and eukaryotes). Despite the high sequence conservation, annotation of ribosomal (r-) protein genes is often difficult because of their short lengths and biased sequence composition. We developed an automated computational pipeline for identification of r-protein genes and applied it to 995 completely sequenced bacterial and 87 archaeal genomes available in the RefSeq database. The pipeline employs curated seed alignments of r-proteins to run position-specific scoring matrix (PSSM)-based BLAST searches against six-frame genome translations, mitigating possible gene annotation errors. As a result of this analysis, we performed a census of prokaryotic r-protein complements, enumerated missing and paralogous r-proteins, and analyzed the distributions of ribosomal protein genes among chromosomal partitions. Phyletic patterns of bacterial and archaeal r-protein genes were mapped to phylogenetic trees reconstructed from concatenated alignments of r-proteins to reveal the history of likely multiple independent gains and losses. These alignments, available for download, can be used as search profiles to improve genome annotation of r-proteins and for further comparative genomics studies.
The identification of regulatory regions for a gene is an important step towards deciphering the gene regulation. Regulatory regions tend to be conserved under evolution that facilitates the application of comparative genomics to identify such regions. The present study is an attempt to make use of this attribute to identify regulatory regions in the Mycobacterium species followed by the development of a database, MycoRRdb. It consist the regulatory regions identified within the intergenic distances of 25 mycobacterial species. MycoRRdb allows to retrieve the identified intergenic regulatory elements in the mycobacterial genomes. In addition to the predicted motifs, it also allows user to retrieve the Reciprocal Best BLAST Hits across the mycobacterial genomes. It is a useful resource to understand the transcriptional regulatory mechanism of mycobacterial species. This database is first of its kind which specifically addresses cis-regulatory regions and also comprehensive to the mycobacterial species. Database URL: http://mycorrdb.uohbif.in.
A range of novel carboxamide fungicides, inhibitors of the succinate dehydrogenase enzyme (SDH, EC 22.214.171.124) is currently being introduced to the crop protection market. The aim of this study was to explore the impact of structurally distinct carboxamides on target site resistance development and to assess possible impact on fitness.
We used a UV mutagenesis approach in Mycosphaerella graminicola, a key pathogen of wheat to compare the nature, frequencies and impact of target mutations towards five subclasses of carboxamides. From this screen we identified 27 amino acid substitutions occurring at 18 different positions on the 3 subunits constituting the ubiquinone binding (Qp) site of the enzyme. The nature of substitutions and cross resistance profiles indicated significant differences in the binding interaction to the enzyme across the different inhibitors. Pharmacophore elucidation followed by docking studies in a tridimensional SDH model allowed us to propose rational hypotheses explaining some of the differential behaviors for the first time. Interestingly all the characterized substitutions had a negative impact on enzyme efficiency, however very low levels of enzyme activity appeared to be sufficient for cell survival. In order to explore the impact of mutations on pathogen fitness in vivo and in planta, homologous recombinants were generated for a selection of mutation types. In vivo, in contrast to previous studies performed in yeast and other organisms, SDH mutations did not result in a major increase of reactive oxygen species levels and did not display any significant fitness penalty. However, a number of Qp site mutations affecting enzyme efficiency were shown to have a biological impact in planta.
Using the combined approaches described here, we have significantly improved our understanding of possible resistance mechanisms to carboxamides and performed preliminary fitness penalty assessment in an economically important plant pathogen years ahead of possible resistance development in the field.
Throughout evolution, the LIM domain has been deployed in many different domain configurations, which has led to the formation of a large and distinct group of proteins. LIM proteins are involved in relaying stimuli received at the cell surface to the nucleus in order to regulate cell structure, motility, and division. Despite their fundamental roles in cellular processes and human disease, little is known about the evolution of the LIM superclass.
We have identified and characterized all known LIM domain-containing proteins in six metazoans and three non-metazoans. In addition, we performed a phylogenetic analysis on all LIM domains and, in the process, have identified a number of novel non-LIM domains and motifs in each of these proteins. Based on these results, we have formalized a classification system for LIM proteins, provided reasonable timing for class and family origin events; and identified lineage-specific loss events. Our analysis is the first detailed description of the full set of LIM proteins from the non-bilaterian species examined in this study.
Six of the 14 LIM classes originated in the stem lineage of the Metazoa. The expansion of the LIM superclass at the base of the Metazoa undoubtedly contributed to the increase in subcellular complexity required for the transition from a unicellular to multicellular lifestyle and, as such, was a critically important event in the history of animal multicellularity.
Members of the homeodomain-leucine zipper (HD-Zip) gene family encode transcription factors that are unique to plants and have diverse functions in plant growth and development such as various stress responses, organ formation and vascular development. Although systematic characterization of this family has been carried out in Arabidopsis and rice, little is known about HD-Zip genes in maize (Zea mays L.).
Methods and Findings
In this study, we described the identification and structural characterization of HD-Zip genes in the maize genome. A complete set of 55 HD-Zip genes (Zmhdz1-55) were identified in the maize genome using Blast search tools and categorized into four classes (HD-Zip I-IV) based on phylogeny. Chromosomal location of these genes revealed that they are distributed unevenly across all 10 chromosomes. Segmental duplication contributed largely to the expansion of the maize HD-ZIP gene family, while tandem duplication was only responsible for the amplification of the HD-Zip II genes. Furthermore, most of the maize HD-Zip I genes were found to contain an overabundance of stress-related cis-elements in their promoter sequences. The expression levels of the 17 HD-Zip I genes under drought stress were also investigated by quantitative real-time PCR (qRT-PCR). All of the 17 maize HD-ZIP I genes were found to be regulated by drought stress, and the duplicated genes within a sister pair exhibited the similar expression patterns, suggesting their conserved functions during the process of evolution.
Our results reveal a comprehensive overview of the maize HD-Zip gene family and provide the first step towards the selection of Zmhdz genes for cloning and functional research to uncover their roles in maize growth and development.
It is difficult to identify copy number variations (CNV) in normal human genomic data due to noise and non-linear relationships between different genomic regions and signal intensity. A high-resolution array comparative genomic hybridization (aCGH) containing 42 million probes, which is very large compared to previous arrays, was recently published. Most existing CNV detection algorithms do not work well because of noise associated with the large amount of input data and because most of the current methods were not designed to analyze normal human samples. Normal human genome analysis often requires a joint approach across multiple samples. However, the majority of existing methods can only identify CNVs from a single sample.
Methodology and Principal Findings
We developed a multi-sample-based genomic variations detector (MGVD) that uses segmentation to identify common breakpoints across multiple samples and a k-means-based clustering strategy. Unlike previous methods, MGVD simultaneously considers multiple samples with different genomic intensities and identifies CNVs and CNV zones (CNVZs); CNVZ is a more precise measure of the location of a genomic variant than the CNV region (CNVR).
Conclusions and Significance
We designed a specialized algorithm to detect common CNVs from extremely high-resolution multi-sample aCGH data. MGVD showed high sensitivity and a low false discovery rate for a simulated data set, and outperformed most current methods when real, high-resolution HapMap datasets were analyzed. MGVD also had the fastest runtime compared to the other algorithms evaluated when actual, high-resolution aCGH data were analyzed. The CNVZs identified by MGVD can be used in association studies for revealing relationships between phenotypes and genomic aberrations. Our algorithm was developed with standard C++ and is available in Linux and MS Windows format in the STL library. It is freely available at: http://embio.yonsei.ac.kr/~Park/mgvd.php.
Single cell genomics is a powerful and increasingly popular tool for studying the genetic make-up of uncultured microbes. A key challenge for successful single cell sequencing and analysis is the removal of exogenous DNA from whole genome amplification reagents. We found that UV irradiation of the multiple displacement amplification (MDA) reagents, including the Phi29 polymerase and random hexamer primers, effectively eliminates the amplification of contaminating DNA. The methodology is quick, simple, and highly effective, thus significantly improving whole genome amplification from single cells.
Fungal laccases have been used in various fields ranging from processes in wood and paper industries to environmental applications. Although a few bacterial laccases have been characterized in recent years, prokaryotes have largely been neglected as a source of novel enzymes, in part due to the lack of knowledge about the diversity and distribution of laccases within Bacteria. In this work genes for laccase-like enzymes were searched for in over 2,200 complete and draft bacterial genomes and four metagenomic datasets, using the custom profile Hidden Markov Models for two- and three- domain laccases. More than 1,200 putative genes for laccase-like enzymes were retrieved from chromosomes and plasmids of diverse bacteria. In 76% of the genes, signal peptides were predicted, indicating that these bacterial laccases may be exported from the cytoplasm, which contrasts with the current belief. Moreover, several examples of putatively horizontally transferred bacterial laccase genes were described. Many metagenomic sequences encoding fragments of laccase-like enzymes could not be phylogenetically assigned, indicating considerable novelty. Laccase-like genes were also found in anaerobic bacteria, autotrophs and alkaliphiles, thus opening new hypotheses regarding their ecological functions. Bacteria identified as carrying laccase genes represent potential sources for future biotechnological applications.
Elongation factor G (EFG) is a core translational protein that catalyzes the elongation and recycling phases of translation. A more complex picture of EFG's evolution and function than previously accepted is emerging from analyzes of heterogeneous EFG family members. Whereas the gene duplication is postulated to be a prominent factor creating functional novelty, the striking divergence between EFG paralogs can be interpreted in terms of innovation in gene function.
We present a computational study of the EFG protein family to cover the role of gene duplication in the evolution of protein function. Using phylogenetic methods, genome context conservation and insertion/deletion (indel) analysis we demonstrate that the EFG gene copies form four subfamilies: EFG I, spdEFG1, spdEFG2, and EFG II. These ancient gene families differ by their indispensability, degree of divergence and number of indels. We show the distribution of EFG subfamilies and describe evidences for lateral gene transfer and recent duplications. Extended studies of the EFG II subfamily concern its diverged nature. Remarkably, EFG II appears to be a widely distributed and a much-diversified subfamily whose subdivisions correlate with phylum or class borders. The EFG II subfamily specific characteristics are low conservation of the GTPase domain, domains II and III; absence of the trGTPase specific G2 consensus motif “RGITI”; and twelve conserved positions common to the whole subfamily. The EFG II specific functional changes could be related to changes in the properties of nucleotide binding and hydrolysis and strengthened ionic interactions between EFG II and the ribosome, particularly between parts of the decoding site and loop I of domain IV.
Our work, for the first time, comprehensively identifies and describes EFG subfamilies and improves our understanding of the function and evolution of EFG duplicated genes.
Whole genome comparative studies of many bacterial pathogens have shown an overall high similarity of gene content (>95%) between phylogenetically distinct subspecies. In highly clonal species that share the bulk of their genomes subtle changes in gene content and small-scale polymorphisms, especially those that may alter gene expression and protein-protein interactions, are more likely to have a significant effect on the pathogen's biology. In order to better understand molecular attributes that may mediate the adaptation of virulence in infectious bacteria, a comparative study was done to further analyze the evolution of a gene encoding an o-methyltransferase that was previously identified as a candidate virulence factor due to its conservation specifically in highly pathogenic Francisella tularensis subsp. tularensis strains. The o-methyltransferase gene is located in the genomic neighborhood of a known pathogenicity island and predicted site of rearrangement. Distinct o-methyltransferase subtypes are present in different Francisella tularensis subspecies. Related protein families were identified in several host species as well as species of pathogenic bacteria that are otherwise very distant phylogenetically from Francisella, including species of Mycobacterium. A conserved sequence motif profile is present in the mammalian host and pathogen protein sequences, and sites of non-synonymous variation conserved in Francisella subspecies specific o-methyltransferases map proximally to the predicted active site of the orthologous human protein structure. Altogether, evidence suggests a role of the F. t. subsp. tularensis protein in a mechanism of molecular mimicry, similar perhaps to Legionella and Coxiella. These findings therefore provide insights into the evolution of niche-restriction and virulence in Francisella, and have broader implications regarding the molecular mechanisms that mediate host-pathogen relationships.
As next-generation sequence (NGS) production continues to increase, analysis is becoming a significant bottleneck. However, in situations where information is required only for specific sequence variants, it is not necessary to assemble or align whole genome data sets in their entirety. Rather, NGS data sets can be mined for the presence of sequence variants of interest by localized assembly, which is a faster, easier, and more accurate approach. We present TASR, a streamlined assembler that interrogates very large NGS data sets for the presence of specific variants by only considering reads within the sequence space of input target sequences provided by the user. The NGS data set is searched for reads with an exact match to all possible short words within the target sequence, and these reads are then assembled stringently to generate a consensus of the target and flanking sequence. Typically, variants of a particular locus are provided as different target sequences, and the presence of the variant in the data set being interrogated is revealed by a successful assembly outcome. However, TASR can also be used to find unknown sequences that flank a given target. We demonstrate that TASR has utility in finding or confirming genomic mutations, polymorphisms, fusions and integration events. Targeted assembly is a powerful method for interrogating large data sets for the presence of sequence variants of interest. TASR is a fast, flexible and easy to use tool for targeted assembly.
An abundance of novel fungal lineages have been indicated by DNA sequencing of the nuclear ribosomal ITS region from environmental samples such as soil and wood. Although phylogenetic analysis of these novel lineages is a key component of unveiling the structure and diversity of complex communities, such analyses are rare for environmental ITS data due to the difficulties of aligning this locus across significantly divergent taxa. One potential approach to this issue is simultaneous alignment and tree estimation. We targeted divergent ITS sequences of the earth tongue fungi (Geoglossomycetes), a basal class in the Ascomycota, to assess the performance of SATé, recent software that combines progressive alignment and tree building. We found that SATé performed well in generating high-quality alignments and in accurately estimating the phylogeny of earth tongue fungi. Drawing from a data set of 300 sequences of earth tongues and progressively more distant fungal lineages, 30 insufficiently identified ITS sequences from the public sequence databases were assigned to the Geoglossomycetes. The association between earth tongues and plants has been hypothesized for a long time, but hard evidence is yet to be collected. The ITS phylogeny showed that four ectomycorrhizal isolates shared a clade with Geoglossum but not with Trichoglossum earth tongues, pointing to the significant potential inherent to ecological data mining of environmental samples. Environmental sampling holds the key to many focal questions in mycology, and simultaneous alignment and tree estimation, as performed by SATé, can be a highly efficient companion in that pursuit.
The development of new microarray technologies makes custom long oligonucleotide arrays affordable for many experimental applications, notably gene expression analyses. Reliable results depend on probe design quality and selection. Probe design strategy should cope with the limited accuracy of de novo gene prediction programs, and annotation up-dating. We present a novel in silico procedure which addresses these issues and includes experimental screening, as an empirical approach is the best strategy to identify optimal probes in the in silico outcome.
We used four criteria for in silico probe selection: cross-hybridization, hairpin stability, probe location relative to coding sequence end and intron position. This latter criterion is critical when exon-intron gene structure predictions for intron-rich genes are inaccurate. For each coding sequence (CDS), we selected a sub-set of four probes. These probes were included in a test microarray, which was used to evaluate the hybridization behavior of each probe. The best probe for each CDS was selected according to three experimental criteria: signal-to-noise ratio, signal reproducibility, and representative signal intensities. This procedure was applied for the development of a gene expression Agilent platform for the filamentous fungus Podospora anserina and the selection of a single 60-mer probe for each of the 10,556 P. anserina CDS.
A reliable gene expression microarray version based on the Agilent 44K platform was developed with four spot replicates of each probe to increase statistical significance of analysis.