We report the sequences of 1,244 human Y chromosomes randomly ascertained from 26 worldwide populations by the 1000 Genomes Project. We discovered more than 65,000 variants, including SNVs, MNVs, indels, STRs, and CNVs. Of these, CNVs contribute the greatest predicted functional impact. We constructed a calibrated phylogenetic tree based on binary SNVs and projected the more complex variants onto it, estimating the numbers of mutations for each class. Our phylogeny reveals bursts of extreme expansions in male numbers that have occurred independently among each of the five continental superpopulations examined, at times of known migrations and technological innovations.
Collective animal behavior, such as the flocking of birds or the shoaling of fish, has inspired a class of algorithms designed to optimize distance-based clusters in various applications, including document analysis and DNA microarrays. In a flocking model, individual agents respond only to their immediate environment and move according to a few simple rules. After several iterations the agents self-organize, and clusters emerge without the need for partitional seeds. In addition to its unsupervised nature, flocking offers several computational advantages, including the potential to reduce the number of required comparisons.
In the tool presented here, Clusterflock, we have implemented a flocking algorithm designed to locate groups (flocks) of orthologous gene families (OGFs) that share an evolutionary history. Pairwise distances that measure phylogenetic incongruence between OGFs guide flock formation. We tested this approach on several simulated datasets by varying the number of underlying topologies, the proportion of missing data, and evolutionary rates, and show that in datasets containing high levels of missing data and rate heterogeneity, Clusterflock outperforms other well-established clustering techniques. We also verified its utility on a known, large-scale recombination event in Staphylococcus aureus. By isolating sets of OGFs with divergent phylogenetic signals, we were able to pinpoint the recombined region without forcing a pre-determined number of groupings or defining a pre-determined incongruence threshold.
Clusterflock is an open-source tool that can be used to discover horizontally transferred genes, recombined areas of chromosomes, and the phylogenetic ‘core’ of a genome. Although we used it here in an evolutionary context, it is generalizable to any clustering problem. Users can write extensions to calculate any distance metric on the unit interval, and can use these distances to ‘flock’ any type of data.
Swarms; Flocking algorithm; Unsupervised clustering; Data mining; Horizontal gene transfer; Recombination; Staphylococcus aureus
Throughout their evolutionary history, genomes acquire new genetic material that facilitates phenotypic innovation and diversification. Developmental processes associated with reproduction are particularly likely to involve novel genes. Abundant gene creation impacts the evolution of chromosomal gene content and general regulatory mechanisms such as dosage compensation. Numerous studies in model organisms have found complex and, at times contradictory, relationships among these genomic attributes highlighting the need to examine these patterns in other systems characterized by abundant sexual selection. Therefore, we examined the association among novel gene creation, tissue-specific gene expression, and chromosomal gene content within stalk-eyed flies. Flies in this family are characterized by strong sexual selection and the presence of a newly evolved X chromosome. We generated RNA-seq transcriptome data from the testes for three species within the family and from seven additional tissues in the highly dimorphic species, Teleopsis dalmanni. Analysis of dipteran gene orthology reveals dramatic testes-specific gene creation in stalk-eyed flies, involving numerous gene families that are highly conserved in other insect groups. Identification of X-linked genes for the three species indicates that the X chromosome arose prior to the diversification of the family. The most striking feature of this X chromosome is that it is highly masculinized, containing nearly twice as many testes-specific genes as expected based on its size. All the major processes that may drive differential sex chromosome gene content—creation of genes with male-specific expression, development of male-specific expression from pre-existing genes, and movement of genes with male-specific expression—are elevated on the X chromosome of T. dalmanni. This masculinization occurs despite evidence that testes expressed genes do not achieve the same levels of gene expression on the X chromosome as they do on the autosomes.
diopsid; comparative transcriptomes; gene duplication; sex-specific gene expression; meiotic drive; dosage compensation
The common bed bug (Cimex lectularius) has been a persistent pest of humans for thousands of years, yet the genetic basis of the bed bug's basic biology and adaptation to dense human environments is largely unknown. Here we report the assembly, annotation and phylogenetic mapping of the 697.9-Mb Cimex lectularius genome, with an N50 of 971 kb, using both long and short read technologies. A RNA-seq time course across all five developmental stages and male and female adults generated 36,985 coding and noncoding gene models. The most pronounced change in gene expression during the life cycle occurs after feeding on human blood and included genes from the Wolbachia endosymbiont, which shows a simultaneous and coordinated host/commensal response to haematophagous activity. These data provide a rich genetic resource for mapping activity and density of C. lectularius across human hosts and cities, which can help track, manage and control bed bug infestations.
The common bedbug is a pest for humans, yet its molecular biology is poorly understood. Here, the authors sequence the common bedbug genome and profile gene expression across all life stages to show major changes in gene expression after feeding on human blood.
Twenty one fully sequenced and well annotated insect genomes were used to construct genome content matrices for phylogenetic analysis and functional annotation of insect genomes. To examine the role of e-value cutoff in ortholog determination we used scaled e-value cutoffs and a single linkage clustering approach.. The present communication includes (1) a list of the genomes used to construct the genome content phylogenetic matrices, (2) a nexus file with the data matrices used in phylogenetic analysis, (3) a nexus file with the Newick trees generated by phylogenetic analysis, (4) an excel file listing the Core (CORE) genes and Unique (UNI) genes found in five insect groups, and (5) a figure showing a plot of consistency index (CI) versus percent of unannotated genes that are apomorphies in the data set for gene losses and gains and bar plots of gains and losses for four consistency index (CI) cutoffs.
We report the identification and characterization of two new members of a family of bilirubin-inducible fluorescent proteins (FPs) from marine chlopsid eels and demonstrate a key region of the sequence that serves as an evolutionary switch from non-fluorescent to fluorescent fatty acid-binding proteins (FABPs). Using transcriptomic analysis of two species of brightly fluorescent Kaupichthys eels (Kaupichthys hyoproroides and Kaupichthys n. sp.), two new FPs were identified, cloned and characterized (Chlopsid FP I and Chlopsid FP II). We then performed phylogenetic analysis on 210 FABPs, spanning 16 vertebrate orders, and including 163 vertebrate taxa. We show that the fluorescent FPs diverged as a protein family and are the sister group to brain FABPs. Our results indicate that the evolution of this family involved at least three gene duplication events. We show that fluorescent FABPs possess a unique, conserved tripeptide Gly-Pro-Pro sequence motif, which is not found in non-fluorescent fatty acid binding proteins. This motif arose from a duplication event of the FABP brain isoforms and was under strong purifying selection, leading to the classification of this new FP family. Residues adjacent to the motif are under strong positive selection, suggesting a further refinement of the eel protein’s fluorescent properties. We present a phylogenetic reconstruction of this emerging FP family and describe additional fluorescent FABP members from groups of distantly related eels. The elucidation of this class of fish FPs with diverse properties provides new templates for the development of protein-based fluorescent tools. The evolutionary adaptation from fatty acid-binding proteins to fluorescent fatty acid-binding proteins raises intrigue as to the functional role of bright green fluorescence in this cryptic genus of reclusive eels that inhabit a blue, nearly monochromatic, marine environment.
In order to complete their life cycle, papillomaviruses have evolved to manipulate a plethora of cellular pathways. The products of the human Alphapapillomavirus E6 proteins specifically interact with and target PDZ containing proteins for degradation. This viral phenotype has been suggested to play a role in viral oncogenesis. To analyze the association of HPV E6 mediated PDZ-protein degradation with cervical oncogenesis, a high-throughput cell culture assay was developed. Degradation of an epitope tagged human MAGI1 isoform was visualized by immunoblot. The correlation between HPV E6-induced degradation of hMAGI1 and epidemiologically determined HPV oncogenicity was evaluated using a Bayesian approach within a phylogenetic context. All tested oncogenic types degraded the PDZ-containing protein hMAGI1d; however, E6 proteins isolated from several related albeit non-oncogenic viral types were equally efficient at degrading hMAGI1. The relationship between both traits (oncogenicity and PDZ degradation potential) is best explained by a model in which the potential to degrade PDZ proteins was acquired prior to the oncogenic phenotype. This analysis provides evidence that the ancestor of both oncogenic and non-oncogenic HPVs acquired the potential to degrade human PDZ-containing proteins. This suggests that HPV E6 directed degradation of PDZ-proteins represents an ancient ecological niche adaptation. Phylogenetic modeling indicates that this phenotype is not specifically correlated with oncogenic risk, but may act as an enabling phenotype. The role of PDZ protein degradation in HPV fitness and oncogenesis needs to be interpreted in the context of Alphapapillomavirus evolution.
It is thought that the ability to degrade PDZ domain containing proteins is a hallmark of oncogenic papillomaviruses. However, since papillomaviruses did not evolve to be oncogenic, this hypothesis does not address the evolutionary importance of this phenotype. The present manuscript attempts to address whether HPV induced degradation of PDZ containing proteins is associated with oncogenic potential as determined by the clinical/epidemiological empirical cancer risk. Using Bayesian approaches to model trait evolution we show that it is highly unlikely for a virus to become oncogenic without first acquiring the ability to degrade PDZ proteins. Furthermore, the ability to degrade PDZ proteins allowed ancestral viruses to colonize a new cellular niche. However, in order to thrive in this new environment, these ancestral viruses had to acquire additional functions. We hypothesize that some of these additional phenotypes lead to oncogenicity. Importantly, our study illustrates the power of combining epidemiological, biochemical and evolutionary data with phylogenetic analysis in attempting to understand the relative role of specific pathogen phenotypes with host pathogenesis.
The amphinomid polychaete Hermodice carunculata is a cosmopolitan and ecologically important omnivore in coral reef ecosystems, preying on a diverse suite of reef organisms and potentially acting as a vector for coral disease. While amphinomids are a key group for determining the root of the Annelida, their phylogenetic position has been difficult to resolve, and their publically available genomic data was scarce.
We performed deep transcriptome sequencing (Illumina HiSeq) and profiling on Hermodice carunculata collected in the Western Atlantic Ocean. We focused this study on 58,454 predicted Open Reading Frames (ORFs) of genes longer than 200 amino acids for our homology search, and Gene Ontology (GO) terms and InterPro IDs were assigned to 32,500 of these ORFs. We used this de novo assembled transcriptome to recover major signaling pathways and housekeeping genes. We also identify a suite of H. carunculata genes related to reproduction and immune response.
We provide a comprehensive catalogue of annotated genes for Hermodice carunculata and expand the knowledge of reproduction and immune response genes in annelids, in general. Overall, this study vastly expands the available genomic data for H. carunculata, of which previously consisted of only 279 nucleotide sequences in NCBI. This underscores the utility of Illumina sequencing for de novo transcriptome assembly in non-model organisms as a cost-effective and efficient tool for gene discovery and downstream applications, such as phylogenetic analysis and gene expression profiling.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1565-6) contains supplementary material, which is available to authorized users.
Next-generation sequencing; Hermodice carunculata; Polychaete; Molecular phylogenetics; de novo assembly; Functional annotation
Insects are the most diverse group of organisms on the planet. Variation in gene expression lies at the heart of this biodiversity and recent advances in sequencing technology have spawned a revolution in researchers' ability to survey tissue-specific transcriptional complexity across a wide range of insect taxa. Increasingly, studies are using a comparative approach (across species, sexes and life stages) that examines the transcriptional basis of phenotypic diversity within an evolutionary context. In the present review, we summarize much of this research, focusing in particular on three critical aspects of insect biology: morphological development and plasticity; physiological response to the environment; and sexual dimorphism. A common feature that is emerging from these investigations concerns the dynamic nature of transcriptome evolution as indicated by rapid changes in the overall pattern of gene expression, the differential expression of numerous genes with unknown function, and the incorporation of novel, lineage-specific genes into the transcriptional profile.
RNA-Seq; non-model organism; differential expression; comparative transcriptomics; NGS; insects
Since its initial identification as a HIV-1-inducible gene in 2002, astrocyte elevated gene-1 (AEG-1), subsequently cloned as metadherin (MTDH) and lysine-rich CEACAM1 coisolated (LYRIC), has emerged over the past 10 years as an important oncogene providing a valuable prognostic marker in patients with various cancers. Recent studies demonstrate that AEG-1/MTDH/LYRIC is a pleiotropic protein that can localize in the cell membrane, cytoplasm, endoplasmic reticulum (ER), nucleus, and nucleolus, and contributes to diverse signaling pathways such as PI3K–AKT, NF-κB, MAPK, and Wnt. In addition to tumorigenesis, this multifunctional protein is implicated in various physiological and pathological processes including development, neurodegeneration, and inflammation. The present review focuses on the discovery of AEG-1/MTDH/LYRIC and conceptualizes areas of future direction for this intriguing gene. We begin by describing how AEG-1, MTDH, and LYRIC were initially identified by different research groups and then discuss AEG-1 structure, functions, localization, and evolution. We conclude with a discussion of the expression profile of AEG-1/MTDH/LYRIC in the context of cancer, neurological disorders, inflammation, and embryogenesis, and discuss how AEG-1/MTDH/LYRIC is regulated. This introductory discussion of AEG-1/MTDH/LYRIC will serve as the basis for the detailed discussions in other chapters of the unique properties of this intriguing molecule.
Oil palm is the most productive oil-bearing crop. Planted on only 5% of the total vegetable oil acreage, palm oil accounts for 33% of vegetable oil, and 45% of edible oil worldwide, but increased cultivation competes with dwindling rainforest reserves. We report the 1.8 gigabase (Gb) genome sequence of the African oil palm Elaeis guineensis, the predominant source of worldwide oil production. 1.535 Gb of assembled sequence and transcriptome data from 30 tissue types were used to predict at least 34,802 genes, including oil biosynthesis genes and homologues of WRINKLED1 (WRI1), and other transcriptional regulators1, which are highly expressed in the kernel. We also report the draft sequence of the S. American oil palm Elaeis oleifera, which has the same number of chromosomes (2n=32) and produces fertile interspecific hybrids with E. guineensis2, but appears to have diverged in the new world. Segmental duplications of chromosome arms define the palaeotetraploid origin of palm trees. The oil palm sequence enables the discovery of genes for important traits as well as somaclonal epigenetic alterations which restrict the use of clones in commercial plantings3, and thus helps achieve sustainability for biofuels and edible oils, reducing the rainforest footprint of this tropical plantation crop.
The recent availability of sequenced genomes from a broad array of chordates (cephalochordates, urochordates and vertebrates) has allowed us to systematically analyze the evolution of uroplakins: tetraspanins (UPK1a and UPK1b families) and their respective partner proteins (UPK2 and UPK3 families).
We report here: (1) the origin of uroplakins in the common ancestor of vertebrates, (2) the appearance of several residues that have statistically significantly positive dN/dS ratios in the duplicated paralogs of uroplakin genes, and (3) the existence of strong coevolutionary relationships between UPK1a/1b tetraspanins and their respective UPK2/UPK3-related partner proteins. Moreover, we report the existence of three new UPK2/3 family members we named UPK2b, 3c and 3d, which will help clarify the evolutionary relationships between fish, amphibian and mammalian uroplakins that may perform divergent functions specific to these different and physiologically distinct groups of vertebrates.
Since our analyses cover species of all major chordate groups this work provides an extremely clear overall picture of how the uroplakin families and their partner proteins have evolved in parallel. We also highlight several novel features of uroplakin evolution including the appearance of UPK2b and 3d in fish and UPK3c in the common ancestor of reptiles and mammals. Additional studies of these novel uroplakins should lead to new insights into uroplakin structure and function.
The species Alphapapillomavirus 7 (alpha-7) contains human papillomavirus genotypes that account for 15% of invasive cervical cancers and are disproportionately associated with adenocarcinoma of the cervix. Complete genome analyses enable identification and nomenclature of variant lineages and sublineages.
The URR/E6 region was sequenced to screen for novel variants of HPV18, 39, 45, 59, 68, 70, 85 and 97 from 1147 cervical samples obtained from multiple geographic regions that had previously been shown to contain an alpha-7 HPV isolate. To study viral heterogeneity, the complete 8 kb genome of 128 isolates, including 109 sequenced for this analysis, were annotated and analyzed. Viral evolution was characterized by constructing phylogenic trees using maximum-likelihood and Bayesian algorithms. Global and pairwise alignments were used to calculate total and ORF/region nucleotide differences; lineages and sublineages were assigned using an alphanumeric system. The prototype genome was assigned to the A lineage or A1 sublineage.
The genomic diversity of alpha-7 HPV types ranged from 1.1% to 6.7% nucleotide sequence differences; the extent of genome-genome pairwise intratype heterogeneity was 1.1% for HPV39, 1.3% for HPV59, 1.5% for HPV45, 1.6% for HPV70, 2.1% for HPV18, and 6.7% for HPV68. ME180 (previously a subtype of HPV68) was designated as the representative genome for HPV68 sublineage C1. Each ORF/region differed in sequence diversity, from most variable to least variable: noncoding region 1 (NCR1) / noncoding region 2 (NCR2) > upstream regulatory region (URR) > E6 / E7 > E2 / L2 > E1 / L1.
These data provide estimates of the maximum viral genomic heterogeneity of alpha-7 HPV type variants. The proposed taxonomic system facilitates the comparison of variants across epidemiological and molecular studies. Sequence diversity, geographic distribution and phylogenetic topology of this clinically important group of HPVs suggest an independent evolutionary history for each type.
Genomic and transcriptomic sequence data are essential tools for tackling ecological problems. Using an approach that combines next-generation sequencing, de novo transcriptome assembly, gene annotation and synthetic gene construction, we identify and cluster the protein families from Favia corals from the northern Red Sea.
We obtained 80 million 75 bp paired-end cDNA reads from two Favia adult samples collected at 65 m (Fav1, Fav2) on the Illumina GA platform, and generated two de novo assemblies using ABySS and CAP3. After removing redundancy and filtering out low quality reads, our transcriptome datasets contained 58,268 (Fav1) and 62,469 (Fav2) contigs longer than 100 bp, with N50 values of 1,665 bp and 1,439 bp, respectively. Using the proteome of the sea anemone Nematostella vectensis as a reference, we were able to annotate almost 20% of each dataset using reciprocal homology searches. Homologous clustering of these annotated transcripts allowed us to divide them into 7,186 (Fav1) and 6,862 (Fav2) homologous transcript clusters (E-value ≤ 2e-30). Functional annotation categories were assigned to homologous clusters using the functional annotation of Nematostella vectensis. General annotation of the assembled transcripts was improved 1-3% using the Acropora digitifera proteome. In addition, we screened these transcript isoform clusters for fluorescent proteins (FPs) homologs and identified seven potential FP homologs in Fav1, and four in Fav2. These transcripts were validated as bona fide FP transcripts via robust fluorescence heterologous expression. Annotation of the assembled contigs revealed that 1.34% and 1.61% (in Fav1 and Fav2, respectively) of the total assembled contigs likely originated from the corals’ algal symbiont, Symbiodinium spp.
Here we present a study to identify the homologous transcript isoform clusters from the transcriptome of Favia corals using a far-related reference proteome. Furthermore, the symbiont-derived transcripts were isolated from the datasets and their contribution quantified. This is the first annotated transcriptome of the genus Favia, a major increase in genomics resources available in this important family of corals.
K-mer; Contig; Open reading frame; Fluorescent protein; Blast; Clustering; High-throughput sequencing; Illumina paired-end; Coral
Population-level studies of parasites have the potential to elucidate patterns of host movement and cross-species interactions that are not evident from host genealogy alone. Bat flies are obligate and generally host-specific blood-feeding parasites of bats. Old-World flies in the family Nycteribiidae are entirely wingless and depend on their hosts for long-distance dispersal; their population genetics has been unstudied to date.
We collected a total of 125 bat flies from three Pteropus species (Pteropus vampyrus, P. hypomelanus, and P. lylei) from eight localities in Malaysia, Cambodia, and Vietnam. We identified specimens morphologically and then sequenced three mitochondrial DNA gene fragments (CoI, CoII, cytB; 1744 basepairs total) from a subset of 45 bat flies. We measured genetic diversity, molecular variance, and population genetic subdivision (FST), and used phylogenetic and haplotype network analyses to quantify parasite genetic structure across host species and localities.
All flies were identified as Cyclopodia horsfieldi with the exception of two individuals of Eucampsipoda sundaica. Low levels of population genetic structure were detected between populations of Cyclopodia horsfieldi from across a wide geographic range (~1000 km), and tests for isolation by distance were rejected. AMOVA results support a lack of geographic and host-specific population structure, with molecular variance primarily partitioned within populations. Pairwise FST values from flies collected from island populations of Pteropus hypomelanus in East and West Peninsular Malaysia supported predictions based on previous studies of host genetic structure.
The lack of population genetic structure and morphological variation observed in Cyclopodia horsfieldi is most likely due to frequent contact between flying fox species and subsequent high levels of parasite gene flow. Specifically, we suggest that Pteropus vampyrus may facilitate movement of bat flies between the three Pteropus species in the region. We demonstrate the utility of parasite genetics as an additional layer of information to measure host movement and interspecific host contact. These approaches may have wide implications for understanding zoonotic, epizootic, and enzootic disease dynamics. Bat flies may play a role as vectors of disease in bats, and their competence as vectors of bacterial and/or viral pathogens is in need of further investigation.
Bartonella; Connectivity; Diptera; Flying fox; Ectoparasite; Emerging infectious disease; Gene flow; Nipah virus; Nycteribiidae; Pathogens; Phylogeography
The enigmatic animal phylum Placozoa holds a key position in the metazoan Tree of Life. A simple bauplan makes it appear to be the most basal metazoan known and genetic evidence also points to a position close to the last common metazoan ancestor. Trichoplax adhaerens is the only formally described species in the phylum to date, making the Placozoa the only monotypic phylum in the animal kingdom. However, recent molecular genetic as well as morphological studies have identified a high level of diversity, and hence a potential high level of taxonomic diversity, within this phylum. Different taxa, possibly at different taxonomic levels, are awaiting description. In this review we firstly summarize knowledge on the morphology, phylogenetic position and ecology of the Placozoa. Secondly, we give an overview of placozoan morphological and genetic diversity and finally present an updated distribution of placozoan populations. We conclude that there is great potential and need to erect new taxa and to establish a firm system for this taxonomic tabula rasa.
Glucosyltransferases (Gtfs) catalyze the synthesis of glucans from sucrose and are produced by several species of lactic-acid bacteria. The oral bacterium Streptococcus mutans produces large amounts of glucans through the action of three Gtfs. GtfD produces water-soluble glucan (WSG), GtfB synthesizes water-insoluble glucans (WIG) and GtfC produces mainly WIG but also WSG. These enzymes, especially those synthesizing WIG, are of particular interest because of their role in the formation of dental plaque, an environment where S. mutans can thrive and produce lactic acid, promoting the formation of dental caries. We sequenced the gtfB, gtfC and gtfD genes from several mutans streptococcal strains isolated from the oral cavity of humans and searched for their homologues in strains isolated from chimpanzees and macaque monkeys. The sequence data were analyzed in conjunction with the available Gtf sequences from other bacteria in the genera Streptococcus, Lactobacillus and Leuconostoc to gain insights into the evolutionary history of this family of enzymes, with a particular emphasis on S. mutans Gtfs. Our analyses indicate that streptococcal Gtfs arose from a common ancestral progenitor gene, and that they expanded to form two clades according to the type of glucan they synthesize. We also show that the clade of streptococcal Gtfs synthesizing WIG appeared shortly after the divergence of viviparous, dentate mammals, which potentially contributed to the formation of dental plaque and the establishment of several streptococci in the oral cavity. The two S. mutans Gtfs capable of WIG synthesis, GtfB and GtfC, are likely the product of a gene duplication event. We dated this event to coincide with the divergence of the genomes of ancestral early primates. Thus, the acquisition and diversification of S. mutans Gtfs predates modern humans and is unrelated to the increase in dietary sucrose consumption.
The evolution of the diverse insect lineages is one of the most fascinating issues in evolutionary biology. Despite extensive research in this area, the resolution of insect phylogeny especially of interordinal relationships has turned out to be still a great challenge. One of the challenges for insect systematics is the radiation of the polyneopteran lineages with several contradictory and/or unresolved relationships. Here, we provide the first transcriptomic data for three enigmatic polyneopteran orders (Dermaptera, Plecoptera, and Zoraptera) to clarify one of the most debated issues among higher insect systematics. We applied different approaches to generate 3 data sets comprising 78 species and 1,579 clusters of orthologous genes. Using these three matrices, we explored several key mechanistic problems of phylogenetic reconstruction including missing data, matrix selection, gene and taxa number/choice, and the biological function of the genes. Based on the first phylogenomic approach including these three ambiguous polyneopteran orders, we provide here conclusive support for monophyletic Polyneoptera, contesting the hypothesis of Zoraptera + Paraneoptera and Plecoptera + remaining Neoptera. In addition, we employ various approaches to evaluate data quality and highlight problematic nodes within the Insect Tree that still exist despite our phylogenomic approach. We further show how the support for these nodes or alternative hypotheses might depend on the taxon- and/or gene-sampling.
polyneoptera; zoraptera; dermaptera; plecoptera; data quality
Aggregatibacter actinomycetemcomitans is implicated in localized aggressive periodontitis. We report the first genome sequence of an A. actinomycetemcomitans strain isolated from an Old World primate.
Alpha human papillomaviruses (HPVs) are among the most common sexually transmitted agents of which a subset causes cervical neoplasia and cancer in humans. Alpha-PVs have also been identified in non-human primates although few studies have systematically characterized such mucosal PVs. We cloned and characterized 10 distinct types of PVs from exfoliated cervicovaginal cells from different populations of female cynomolgus macaques (Macaca fascicularis) originating from China and Indonesia. These include 5 novel genotypes and 5 previously identified genotypes found in rhesus (Macaca mulatta) (RhPV-1, RhPV-a, RhPV-b and RhPV-d) and cynomolgus macaques (MfPV-a). Type-specific primers were designed to amplify the complete PV genomes using an overlapping PCR method. Four MfPVs were associated with cervical intraepithelial neoplasia (CIN). The most prevalent virus type was MfPV-3 (formerly RhPV-d), which was identified in 60% of animals with CIN. In addition, the complete genomes of variants of MfPV-3 and RhPV-1 were characterized. These variants are 97.1% and 97.7% similar across the L1 nucleotide sequences with the prototype genomes, respectively. Sequence comparisons and phylogenetic analyses indicate that these novel MfPVs cluster together within the alpha PV α12 species closely related to the α9 (e.g., HPV16) and α11 species (e.g., HPV34), and all share a most recent common ancestor. Our data expand the molecular diversity of non-human primate PVs and suggest the recent expansion of alpha PV species groups. Moreover, identification of an overlapping set of MfPVs in rhesus and cynomolgus macaques indicates that non-human primate alpha PVs might not be strictly species specific and that “subtypes” may represent recent divergence of host species or past interspecies infection.
alpha papillomavirus; Macaca fascicularis; novel PVs; genomic diversity; evolution
The international wildlife trade is a key threat to biodiversity. Temporal genetic marketplace monitoring can determine if wildlife trade regulation efforts such as the Convention on International Trade in Endangered Species (CITES) are succeeding. Protected under CITES effective 1997, sturgeons and paddlefishes, the producers of black caviar, are flagship CITES species.
We test whether CITES has limited the amount of fraudulent black caviar reaching the marketplace. Using mitochondrial DNA-based methods, we compare mislabeling in caviar and meat purchased in the New York City area pre and post CITES listing. Our recent sampling of this market reveals a decrease in mislabeled caviar (2006–2008; 10%; n = 90) compared to pre-CITES implementation (1995–1996; 19%; n = 95). Mislabeled caviar was found only in online purchase (n = 49 online/41 retail).
Stricter controls on importing and exporting as per CITES policies may be having a positive conservation effect by limiting the amount of fraudulent caviar reaching the marketplace. Sturgeons and paddlefishes remain a conservation priority, however, due to continued overfishing and habitat degradation. Other marine and aquatic species stand to benefit from the international trade regulation that can result from CITES listing.
Human DOR/TP53INP2 displays a unique bifunctional role as a modulator of autophagy and gene transcription. However, the domains or regions of DOR that participate in those functions have not been identified. Here we have performed structure/function analyses of DOR guided by identification of conserved regions in the DOR gene family by phylogenetic reconstructions. We show that DOR is present in metazoan species. Invertebrates harbor only one gene, DOR/Tp53inp2, and in the common ancestor of vertebrates Tp53inp1 may have arisen by gene duplication. In keeping with these data, we show that human TP53INP1 regulates autophagy and that different DOR/TP53INP2 and TP53INP1 proteins display transcriptional activity. The use of molecular evolutionary information has been instrumental to determine the regions that participate in DOR functions. DOR and TP53INP1 proteins share two highly conserved regions (region 1, aa residues 28–42; region 2, 66–112 in human DOR). Mutation of conserved hydrophobic residues in region 1 of DOR (that are part of a nuclear export signal, NES) reduces transcriptional activity, and blocks nuclear exit and autophagic activity under autophagy-activated conditions. We also identify a functional and conserved LC3-interacting motif (LIR) in region 1 of DOR and TP53INP1 proteins. Mutation of conserved acidic residues in region 2 of DOR reduces transcriptional activity, impairs nuclear exit in response to autophagy activation, and disrupts autophagy. Taken together, our data reveal DOR and TP53INP1 as dual regulators of transcription and autophagy, and identify two conserved regions in the DOR family that concentrate multiple functions crucial for autophagy and transcription.
A novel result of the current research is the development and implementation of a unique functional phylogenomic approach that explores the genomic origins of seed plant diversification. We first use 22,833 sets of orthologs from the nuclear genomes of 101 genera across land plants to reconstruct their phylogenetic relationships. One of the more salient results is the resolution of some enigmatic relationships in seed plant phylogeny, such as the placement of Gnetales as sister to the rest of the gymnosperms. In using this novel phylogenomic approach, we were also able to identify overrepresented functional gene ontology categories in genes that provide positive branch support for major nodes prompting new hypotheses for genes associated with the diversification of angiosperms. For example, RNA interference (RNAi) has played a significant role in the divergence of monocots from other angiosperms, which has experimental support in Arabidopsis and rice. This analysis also implied that the second largest subunit of RNA polymerase IV and V (NRPD2) played a prominent role in the divergence of gymnosperms. This hypothesis is supported by the lack of 24nt siRNA in conifers, the maternal control of small RNA in the seeds of flowering plants, and the emergence of double fertilization in angiosperms. Our approach takes advantage of genomic data to define orthologs, reconstruct relationships, and narrow down candidate genes involved in plant evolution within a phylogenomic view of species' diversification.
Understanding the genetic and genomic basis of plant diversification has been a major goal of evolutionary biologists since Darwin first pondered his “abominable mystery,” the rapid diversification of the angiosperms in the fossil record. We develop and deploy a functional phylogenomic approach that helps identify genes and biological processes putatively involved in species diversification. We assembled a matrix of 22,833 orthologs from 150 species to reconstruct seed plant phylogenetic relationships and to identify gene sets with a unique evolutionary signal. Our analysis of overrepresented biological processes in these sets narrowed down possible genetic mechanisms underlying plant adaptation and diversification. The phylogenetic relationships we uncovered support the hypothesis that gnetophytes are closely related to the rest of the gymnosperms at the base of the living seed plants. We also found that genes involved in post-transcriptional silencing via RNA interference (RNAi)—increasingly important in understanding plant evolution—are significantly represented early in angiosperm and gymnosperm divergence, with an apparent loss of specific classes of small interfering RNAs (siRNA) in gymnosperms. Our functional phylogenomic approach can be applied to any taxa with available sequences to enhance our knowledge of the evolutionary processes underlying biodiversity in general.